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Preface 


Our original purpose in writing this book was to provide a text for the under- 
graduate linear algebra course at the Massachusetts Institute of Technology. This 
course was designed for mathematics majors at the junior level, although three- 
fourths of the students were drawn from other scientific and technological disciplines 
and ranged from freshmen through graduate students. This description of the 
M.I.T. audience for the text remains generally accurate today. The ten years since 
the first edition have seen the proliferation of linear algebra courses throughout 
the country and have afforded one of the authors the opportunity to teach the 
basic material to a variety of groups at Brandeis University, Washington Univer- 
sity (St. Louis), and the University of California (Irvine). 

Our principal aim in revising Linear Algebra has been to increase the variety 
of courses which can easily be taught from it. On one hand, we have structured the 
chapters, especially the more difficult ones, so that there are several natural stop- 
ping points along the way, allowing the instructor in a one-quarter or one-semester 
course to exercise a considerable amount of choice in the subject matter. On the 
other hand, we have increased the amount of material in the text, so that it can be 
used for a rather comprehensive one-year course in linear algebra and even as a 
reference book for mathematicians. 

The major changes have been in our treatments of canonical forms and inner 
product spaces. In Chapter 6 we no longer begin with the general spatial theory 
which underlies the theory of canonical forms. We first handle characteristic values 
in relation to triangulation and diagonalization theorems and then build our way 
up to the general theory. We have split Chapter 8 so that the basic material on 
inner product spaces and unitary diagonalization is followed by a Chapter 9 which 
treats sesqui-linear forms and the more sophisticated properties of normal opera- 
tors, including normal operators on real inner product spaces. 

We have also made a number of small changes and improvements from the 
first edition. But the basic philosophy behind the text is unchanged. 

We have made no particular concession to the fact that the majority of the 
students may not be primarily interested in mathematics. For we believe a mathe- 
matics course should not give science, engineering, or social science students a 


hodgepodge of techniques, but should provide them with an understanding of 
basic mathematical concepts. 
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On the other hand, we have been keenly aware of the wide range of back- 
grounds which the students may possess and, in particular, of the fact that the 
students have had very little experience with abstract mathematical reasoning. 
For this reason, we have avoided the introduction of too many abstract ideas at 
the very beginning of the book. In addition, we have included an Appendix which 
presents such basic ideas as set, function, and equivalence relation. We have found 
it most profitable not to dwell on these ideas independently, but to advise the 
students to read the Appendix when these ideas arise. 

Throughout the book we have included a great variety of examples of the 
important concepts which occur. The study of such examples is of fundamental 
importance and tends to minimize the number of students who can repeat defini- 
tion, theorem, proof in logical order without grasping the meaning of the abstract 
concepts. The book also contains a wide variety of graded exercises (about six 
hundred), ranging from routine applications to ones which will extend the very 
best students. These exercises are intended to be an important part of the text. 

Chapter 1 deals with systems of linear equations and their solution by means 
of elementary row operations on matrices. It has been our practice to spend about 
six lectures on this material. It provides the student with some picture of the 
origins of linear algebra and with the computational technique necessary to under- 
stand examples of the more abstract ideas occurring in the later chapters. Chap- 
ter 2 deals with vector spaces, subspaces, bases, and dimension. Chapter 3 treats 
linear transformations, their algebra, their representation by matrices, as well as 
isomorphism, linear functionals, and dual spaces. Chapter 4 defines the algebra of 
polynomials over a field, the ideals in that algebra, and the prime factorization of 
a polynomial. It also deals with roots, Taylor’s formula, and the Lagrange inter- 
polation formula. Chapter 5 develops determinants of square matrices, the deter- 
minant being viewed as an alternating n-linear function of the rows of a matrix, 
and then proceeds to multilinear functions on modules as well as the Grassman ring. 
The material on modules places the concept of determinant in a wider and more 
comprehensive setting than is usually found in elementary textbooks. Chapters 6 
and 7 contain a discussion of the concepts which are basic to the analysis of a single 
linear transformation on a finite-dimensional vector space; the analysis of charac- 
teristic (eigen) values, triangulable and diagonalizable transformations; the con- 
cepts of the diagonalizable and nilpotent parts of a more general transformation, 
and the rational and Jordan canonical forms. The primary and cyclic decomposition 
theorems play a central role, the latter being arrived at through the study of 
admissible subspaces. Chapter 7 includes a discussion of matrices over a polynomial 
domain, the computation of invariant factors and elementary divisors of a matrix, 
and the development of the Smith canonical form. The chapter ends with a dis- 
cussion of semi-simple operators, to round out the analysis of a single operator. 
Chapter 8 treats finite-dimensional inner product spaces in some detail. It covers 
the basic geometry, relating orthogonalization to the idea of ‘best approximation 
to a vector’ and leading to the concepts of the orthogonal projection of a vector 
onto a subspace and the orthogonal complement of a subspace. The chapter treats 
unitary operators and culminates in the diagonalization of self-adjoint and normal 
operators. Chapter 9 introduces sesqui-linear forms, relates them to positive and 
self-adjoint operators on an inner product space, moves on to the spectral theory 
of normal operators and then to more sophisticated results concerning normal 
operators on real or complex inner product spaces. Chapter 10 discusses bilinear 
forms, emphasizing canonical forms for symmetric and skew-symmetric forms, as 
well as groups preserving non-degenerate forms, especially the orthogonal, unitary, 
pseudo-orthogonal and Lorentz groups. 

We feel that any course which uses this text should cover Chapters 1, 2, and 3 
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thoroughly, possibly excluding Sections 3.6 and 3.7 which deal with the double dual 
and the transpose of a linear transformation. Chapters 4 and 5, on polynomials and 
determinants, may be treated with varying degrees of thoroughness. In fact, 
polynomial ideals and basic properties of determinants may be covered quite 
sketchily without serious damage to the flow of the logic in the text; however, our 
inclination is to deal with these chapters carefully (except the results on modules), 
because the material illustrates so well the basic ideas of linear algebra. An ele- 
mentary course may now be concluded nicely with the first four sections of Chap- 
ter 6, together with (the new) Chapter 8. If the rational and Jordan forms are to 
be included, a more extensive coverage of Chapter 6 is necessary. 

Our indebtedness remains te those who contributed to the first edition, espe- 
cially to Professors Harry Furstenberg, Louis Howard, Daniel Kan, Edward Thorp, 
to Mrs. Judith Bowers, Mrs. Betty Ann (Sargent) Rose and Miss Phyllis Ruby. 
In addition, we would like to thank the many students and colleagues whose per- 
ceptive comments led to this revision, and the staff of Prentice-Hall for their 
patience in dealing with two authors caught in the throes of academic administra- 
tion. Lastly, special thanks are due to Mrs. Sophia Koulouras for both her skill 
and her tireless efforts in typing the revised manuscript. 


K. M. H. / R. A. K. 


Contents 


Chapter 1. 


Chapter 2. 


Chapter 3. 


vt 


Linear Equations 


1.1. 


Fields 

Systems of Linear Equations 

Matrices and Elementary Row Operations 
Row-Reduced Echelon Matrices 

Matrix Multiplication 

Invertible Matrices 


Vector Spaces 


2.1. 


Vector Spaces 

Subspaces 

Bases and Dimension 

Coordinates 

Summary of Row-Equivalence 
Computations Concerning Subspaces 


Linear Transformations 


3.1. 


Linear Transformations 
The Algebra of Linear Transformations 
Tsomorphism 


Representation of Transformations by Matrices 


Linear Functionals 
The Double Dual 
The Transpose of a Linear Transformation 


Chapter 4. 


Chapter 5. 


Chapter 6. 


Chapter 7. 


Chapter 8. 


Polynomials 
4.1. Algebras 
4.2. The Algebra of Polynomials 


4.3. Lagrange Interpolation 

4.4. Polynomial Ideals 

4.5. The Prime Factorization of a Polynomial 
Determinants 

5.1. Commutative Rings 

5.2. Determinant Functions 

5.3. Permutations and the Uniqueness of Determinants 
5.4, Additional Properties of Determinants 
5.5. Modules 

5.6. Multilinear Functions 

5.7. The Grassman Ring 


Elementary Canonical Forms 


6.1. 
6.2. 
6.3. 
6.4. 
6.5. 


6.6. 
6.7. 
6.8. 


Introduction 

Characteristic Values 

Annihilating Polynomials 

Invariant Subspaces 

Simultaneous Triangulation; Simultaneous 
Diagonalization 

Direct-Sum Decompositions 

Invariant Direct Sums 

The Primary Decomposition Theorem 


The Rational and Jordan Forms 


7.1. 
7.2. 
7.3. 
7.4. 
7.5. 


Cyclic Subspaces and Annihilators 

Cyclic Decompositions and the Rational Form 
The Jordan Form 

Computation of Invariant Factors 

Summary; Semi-Simple Operators 


Inner Product Spaces 


8.1. 


Inner Products 

Inner Product Spaces 

Linear Functionals and Adjoints 
Unitary Operators 

Normal Operators 


Contents 


117 


117 
119 
124 
127 
134 


140 


140 
141 
150 
156 
164 
166 
173 


181 


181 
182 
190 
198 


206 
209 
213 
219 


227 


227 
231 
244 
251 
262 


270 


270 
277 
290 
299 
311 


vit 


viii 


Contents 


Chapter 9. Operators on Inner Product Spaces 


9.1. 
9.2. 
9.3. 
9.4. 
9.5. 
9.6. 


Introduction 

Forms on Inner Product Spaces 
Positive Forms 

More on Forms 

Spectral Theory 

Further Properties of Normal Operators 


Chapter 10. Bilinear Forms 


10.1. 
10.2. 
10.3. 


10.4 


Appendix 


A.l. 
A.2, 
A.3. 
A.4. 
A. 
A.6. 


Bibliography 


Index 


Bilinear Forms 

Symmetric Bilinear Forms 
Skew-Symmetric Bilinear Forms 
Groups Preserving Bilinear Forms 


Sets 

Functions 

Equivalence Relations 

Quotient Spaces 

Equivalence Relations in Linear Algebra 
The Axiom of Choice 


319 


319 
320 
325 
332 
335 
349 


359 


359 
367 
375 
379 


386 


387 
388 
391 
394 
397 
399 


400 


401 


1. Linear Equations 


1.1. Fields 


We assume that the reader is familiar with the elementary algebra of 
real and complex numbers. For a large portion of this book the algebraic 
properties of numbers which we shall use are easily deduced from the 
following brief list of properties of addition and multiplication. We let F 
denote either the set of real numbers or the set of complex numbers. 


1. Addition is commutative, 


rr+y=yre 
for all x and y in F. 
2. Addition is associative, 


at+(yt+2)=(t+y +z 


for all x, y, and z in F. 

3. There is a unique element 0 (zero) in F such that z + 0 = x, for 
every x in F. 

4. To each z in F there corresponds a unique element (—2) in F such 
that x + (—z) = 0. 

5. Multiplication is commutative, 


xy = yx 
for all x and y in F. 
6. Multiplication is associative, 


x(yz) = (xy)z 
for all x, y, and z in F. 
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7. There is a unique non-zero element 1 (one) in F such that z1 = zx, 
for every z in F. 

8. To each non-zero x in F there corresponds a unique element x7! 
(or 1/z) in F such that za-! = 

9. Multiplication distributes over addition; that is, z(y + 2) = 
xy + zz, for all x, y, and z in F. 


Suppose one has a set F of objects x, y, z, .. . and two operations on 
the elements of F as follows. The first operation, called addition, asso- 
ciates with each pair of elements x, y in F an element (x + y) in F; the 
second operation, called multiplication, associates with each pair z, y an 
element zy in F; and these two operations satisfy conditions (1)-(9) above. 
The set F, together with these two operations, is then called a field. 
Roughly speaking, a field is a set together with some operations on the 
objects in that set which behave like ordinary addition, subtraction, 
multiplication, and division of numbers in the sense that they obey the 
nine rules of algebra listed above. With the usual operations of addition 
and multiplication, the set C of complex numbers is a field, as is the set R 
of real numbers, 

For most of this book the ‘numbers’ we use may as well be the ele- 
ments from any field F. To allow for this generality, we shall use the 
word ‘scalar’ rather than ‘number.’ Not much will be lost to the reader 
if he always assumes that the field of scalars is a subfield of the field of 
complex numbers. A subfield of the field C is a set F of complex numbers 
which is itself a field under the usual operations of addition and multi- 
plication of complex numbers. This means that 0 and 1 are in the set F, 
and that if x and y are elements of F, so are (x + y), —z, ry, and x7! 
(if z = 0). An example of such a subfield is the field R of real numbers; 
for, if we identify the real numbers with the complex numbers (a + 1b) 
for which b = 0, the 0 and 1 of the complex field are real numbers, and 
if z and y are real, so are (x + y), —z, zy, and 27! (if x #0). We shall 
give other examples below. The point of our discussing subfields is essen- 
tially this: If we are working with scalars from a certain subfield of C, 
then the performance of the operations of addition, subtraction, multi- 
plication, or division on these scalars does not take us out of the given 
subfield. 


EXamMPLE 1. The set of positive integers: 1, 2, 3, ..., is not a sub- 
field of C', for a variety of reasons. For example, 0 is not a positive integer; 
for no positive integer n is —n a positive integer; for no positive integer n 
except 1 is 1/n a positive integer. 


EXAMPLE 2. The set of integers:..., —2, —1, 0, 1, 2,...,isnota 
subfield of C, because for an integer n, 1/n is not an integer unless n is 1 or 
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—1. With the usual operations of addition and multiplication, the set of 
integers satisfies all of the conditions (1)—(9) except condition (8). 


EXAMPLE 3. The set of rational numbers, that is, numbers of the 
form p/q, where p and q are integers and q ~ 0, is a subfield of the field 
of complex numbers. The division which is not possible within the set of 
integers is possible within the set of rational numbers. The interested 
reader should verify that any subfield of C must contain every rational 
number. 


EXAMPLE 4. The set of all complex numbers of the form z + yV 2, 
where x and y are rational, is a subfield of C. We leave it to the reader to 
verify this. 


In the examples and exercises of this book, the reader should assume 
that the field involved is a subfield of the complex numbers, unless it is 
expressly stated that the field is more general. We do not want to dwell 
on this point; however, we should indicate why we adopt such a conven- 
tion. If F is a field, it may be possible to add the unit 1 to itself a finite 
number of times and obtain 0 (see Exercise 5 following Section 1.2): 


1+1+---+1=0. 


That does not happen in the complex number field (or in any subfield 
thereof). If it does happen in F, then the least n such that the sum of n 
1’s is 0 is called the characteristic of the field F. If it does not happen 
in F, then (for some strange reason) F is called a field of characteristic 
zero. Often, when we assume F is a subfield of C, what we want to guaran- 
tee is that F is a field of characteristic zero; but, in a first exposure to 
linear algebra, it is usually better not to worry too much about charac- 
teristics of fields. 


1.2. Systems of Linear Equations 


Suppose F is a field. We consider the problem of finding n scalars 
(elements of F) 21,..., £n which satisfy the conditions 
Aya + Arte + +++ + Ainta =Y 
Anti + Arte +--+ + Antr =Y 
(1-1) : : : : 
Am + AÅ m22 + she + Annta = Ym 


where Yn.. ., Ym and Ay, 1<i<m, 1<j<n, are given elements 
of F. We call (1-1) a system of m linear equations in n unknowns. 
Any n-tuple (xı, ..., £n) of elements of F which satisfies each of the 
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equations in (1-1) is called a solution of the system. If yı = y = +--+ = 
Ym = 0, we say that the system is homogeneous, or that each of the 
equations is homogeneous. 

Perhaps the most fundamental technique for finding the solutions 
of a system of linear equations is the technique of elimination. We can 
illustrate this technique on the homogeneous system 


241 — 22+ 23 =0 
a + 322 + 4r; = 0. 


If we add (—2) times the second equation to the first equation, we obtain 


—7Tx_ — 7x3 = 0 
or, t2 = —23. If we add 3 times the first equation to the second equation, 
we obtain 
Tri + 7x3 = 0 
or, xı = —23. So we conclude that if (x1, x2, x3) is a solution then zı = t: = 


—2x3. Conversely, one can readily verify that any such triple is a solution. 
Thus the set of solutions consists of all triples (—a, —a, a). 

We found the solutions to this system of equations by ‘eliminating 
unknowns,’ that is, by multiplying equations by scalars and then adding 
to produce equations in which some of the x; were not present. We wish 
to formalize this process slightly so that we may understand why it works, 
and so that we may carry out the computations necessary to solve a 
system in an organized manner. 

For the general system (1-1), suppose we select m scalars cn... , Cm, 
multiply the jth equation by c; and then add. We obtain the equation 


(An + +++ + €mAm)t1 + +++ + (Ain + +++ + CnAmn)on 

= CY +e + CY m. 
Such an equation we shall call a linear combination of the equations in 
(1-1). Evidently, any solution of the entire system of equations (1-1) will 
also be a solution of this new equation. This is the fundamental idea of 
the elimination process. If we have another system of linear equations 
Bux + +++ + Bintn = & 
(1-2) : f 3 
Buzi + +++ + Bente = Zk 
in which each of the k equations is a linear combination of the equations 
in (1-1), then every solution of (1-1) is a solution of this new system. Of 
course it may happen that some solutions of (1-2) are not solutions of 
(1-1). This clearly does not happen if each equation in the original system 
is a linear combination of the equations in the new system. Let us say 
that two systems of linear equations are equivalent if each equation 
in each system is a linear combination of the equations in the other system. 
We can then formally state our observations as follows. 


Sec. 1.2 Systems of Linear Equations 


Theorem 1. Equivalent systems of linear equations have exactly the 
same solutions. 


If the elimination process is to be effective in finding the solutions of 
a system like (1-1), then one must see how, by forming linear combina- 
tions of the given equations, to produce an equivalent system of equations 
which is easier to solve. In the next section we shall discuss one method 
of doing this. 


Exercises 


1. Verify that the set of complex numbers described in Example 4 is a sub- 
field of C. 


2. Let F be the field of complex numbers. Are the following two systems of linear 
equations equivalent? If so, express each equation in each system as a linear 
combination of the equations in the other system. 


t~r = 387, + 2, = 0 


2a, + t: = a+a.=0 
3. Test the following systems of equations as in Exercise 2. 
—x4, + w+ 423 = 0 Tı — r = 
zı + 322+ 823 = 0 z: + 323 = 0 


dar + xo + $23 = 0 
4. Test the following systems as in Exercise 2. 
2a, + (—1 + i)z: + 2% =0 (1+ §)a+ 8 - in - m= 0 


Bay — Ziz; + bay = Ray — data + ty + Taq = 


5. Let F bea set which contains exactly two elements, 0 and 1. Define an addition 
and multiplication by the tables: 








+|0 1 -JO 1 
0/0 1 0/0 0 
1/1 0 1/0 1 


Verify that the set F, together with these two operations, is a field. 


6. Prove that if two homogeneous systems of linear equations in two unknowns 
have the same solutions, then they are equivalent. 


7. Prove that each subfield of the field of complex numbers contains every 
rational number. 


8. Prove that each field of characteristic zero contains a copy of the rational 
number field. 


Qn 
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1.3. Matrices and Elementary 
Row Operations 


One cannot fail to notice that in forming linear combinations of 
linear equations there is no need to continue writing the ‘unknowns’ 


Zi, ...,%n, since one actually computes only with the coefficients A ;; and 
the scalars y;. We shall now abbreviate the system (1-1) by 
AX =Y 
where 
An ee Ain 
A=|{ : : 
Am i Amn 
Tı Yı 
X= and Y=]: 
Ln. Ym, 


We call A the matrix of coefficients of the system. Strictly speaking, 
the rectangular array displayed above is not a matrix, but is a repre- 
sentation of a matrix. An m X n matrix over the field F is a function 
A from the set of pairs of integers (iJ), 1 <i <m, 1 <J <n, into the 
field F. The entries of the matrix A are the scalars A (i, J) = Ai;, and 
quite often it is most convenient to describe the matrix by displaying its 
entries in a rectangular array having m rows and n columns, as above. 
Thus X (above) is, or defines, an n X 1 matrix and Y isan m X 1 matrix. 
For the time being, AX = Y is nothing more than a shorthand notation 
for our system of linear equations. Later, when we have defined a multi- 
plication for matrices, it will mean that Y is the product of A and X. 

We wish now to consider operations on the rows of the matrix A 
which correspond to forming linear combinations of the equations in 
the system AX = Y. We restrict our attention to three elementary row 
operations on an m X n matrix A over the field F: 


1. multiplication of one row of A by a non-zero scalar c; 

2. replacement of the rth row of A by row r plus c times row s, c any 
scalar and r ¥ s; 

3. interchange of two rows of A. 


An elementary row operation is thus a special type of function (rule) e 
which associated with each m X n matrix A an m X n matrix e(A). One 
can precisely describe e in the three cases as follows: 


1. e(A);; = Aij if a x T; e(A),; = CA rj. 

2; e(A)a; = Ajj if i Æ T, eld) = Ár + CA gj. 

3. e(A),;; = Ai; if 7 is different from both r and s, e(A),; = Aaj, 
e(d) = Ar 
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In defining e(A), it is not really important how many columns A has, but 
the number of rows of A is crucial. For example, one must worry a little 
to decide what is meant by interchanging rows 5 and 6 of a 5 X 5 matrix. 
To avoid any such complications, we shall agree that an elementary row 
operation e is defined on the class of all m X n matrices over F, for some 
fixed m but any n. In other words, a particular e is defined on the class of 
all m-rowed matrices over F. 

One reason that we restrict ourselves to these three simple types of 
row operations is that, having performed such an operation e on a matrix 
A, we can recapture A by performing a similar operation on e(A). 


Theorem 2. To each elementary row operation e there corresponds an 
elementary row operation e, of the same type as e, such that e:(e(A)) = 
e(e,(A)) = A for each A. In other words, the inverse operation (function) of 
an elementary row operation exists and is an elementary row operation of the 
same type. 


Proof. (1) Suppose e is the operation which multiplies the rth row 
of a matrix by the non-zero scalar ¢. Let e be the operation which multi- 
plies row r by c7}. (2) Suppose e is the operation which replaces row r by 
row r plus c times row s, r Æ s. Let eı be the operation which replaces row r 
by row r plus (—c) times row s. (3) If e interchanges rows r and s, let e, = e. 
In each of these three cases we clearly have ei(e(A)) = e(e(A)) = A for 
each A. I 


Definition. If A and B are m X n matrices over the field F, we say that 
B is row-equivalent to A if B can be obtained from A by a finite sequence 
of elementary row operations. 


Using Theorem 2, the reader should find it easy to verify the following. 
Each matrix is row-equivalent to itself; if B is row-equivalent to A, then A 
is row-equivalent to B; if B is row-equivalent to A and C is row-equivalent 
to B, then C is row-equivalent to A. In other words, row-equivalence is 
an equivalence relation (see Appendix). 


Theorem 3. If A and B are row-equivalent m X n matrices, the homo- 
geneous systems of linear equations AX = 0 and BX = 0 have exactly the 
same solutions. 


Proof. Suppose we pass from A to B by a finite sequence of 
elementary row operations: 


A= Ap A179 ++: DA = B. 


It is enough to prove that the systems A;X = 0 and A;}X = 0 have the 
same solutions, i.e., that one elementary row operation does not disturb 
the set of solutions. 
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So suppose that B is obtained from A by a single elementary row 
operation. No matter which of the three types the operation is, (1), (2), 
or (3), each equation in the system BX = 0 will be a linear combination 
of the equations in the system AX = 0. Since the inverse of an elementary 
row operation is an elementary row operation, each equation in AX = 0 
will also be a linear combination of the equations in BX = 0. Hence these 
two systems are equivalent, and by Theorem 1 they have the same 
solutions. J 


EXAMPLE 5. Suppose F is the field of rational numbers, and 
2 -1 3 2 
1 4 0 il 

2 èé -l 5 

We shall perform a finite sequence of elementary row operations on A, 
indicating by numbers in parentheses the type of operation performed. 


A= 


2 —1 3 2 0 -9 3 4]. 
1 4 ee a 4 0 cat 
2 6 iv 5 2 6 -=l 5 
0—9 3 4 0-9 3 4 
1 4 OST Pe br 4 0 eS 
0 =o 7 OG: 1 ae 
0—9 3 4 0 0 4 48 
1 0 —2 ASE 0—2 a 
0 1 3 0 1 4 cae 
® o 1—4 0 0 1 -4 
ı 0 —2 B|] 0 0 ae | 
G a cs We 

0 0 1 —4t 

100 4 

010 =$ 


The row-equivalence of A with the final matrix in the above sequence 
tells us in particular that the solutions of 


2zı — qz: +323 + 2a, = 0 
zı + 4r — 14 =0 
221 + 61g — T3 + 5x, = 0 
and 
z3 —4¢r, = 0 
Tı + 4r =0 
Xe — ga, =0 


are exactly the same. In the second system it is apparent that if we assign 
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any rational value c to x we obtain a solution (—4¢c, $, 44c, c), and also 
that every solution is of this form. 


Examp.e 6. Suppose F is the field of complex numbers and 


ae 
A=|-i 3l 
1 2 


In performing row operations it is often convenient to combine several 
operations of type (2). With this in mind 


~1 i 0 2+% 0 1 0 1 
~i 3| jo 342/210 34211} ]0 o} 
1 2 1 2 1 2 1 0 
Thus the system of equations 
—m + tx. = 0 
—t1 + 3x2 = 0 
a + 22. = 0 


has only the trivial solution zı = z = 0. 


In Examples 5 and 6 we were obviously not performing row opera- 
tions at random. Our choice of row operations was motivated by a desire 
to simplify the coefficient matrix in a manner analogous to ‘eliminating 
unknowns’ in the system of linear equations. Let us now make a formal 
definition of the type of matrix at which we were attempting to arrive. 


Definition. An m X n matriz R is called row-reduced 1f: 


(a) the first non-zero entry in each non-zero row of R is equal to 1; 
(b) each column of R which contains the leading non-zero entry of some 
row has all its other entries 0. 


EXAMPLE 7. One example of a row-reduced matrix is the nX n 
(square) identity matrix I. This is the n X n matrix defined by 
OE Spat oy 
Ag = tae ie if ij. 
This is the first of many occasions on which we shall use the Kronecker 
delta (6). 


In Examples 5 and 6, the final matrices in the sequences exhibited 
there are row-reduced matrices. Two examples of matrices which are not 
row-reduced are: 


1 0 0 0 0 2 1 
0 1 —l O} 1 0 —3} 
0 0 1 0 0 0 0 
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The second matrix fails to satisfy condition (a), because the leading non- 
zero entry of the first row is not 1. The first matrix does satisfy condition 
(a), but fails to satisfy condition (b) in column 3. 

We shall now prove that we can pass from any given matrix to a row- 
reduced matrix, by means of a finite number of elementary row oper- 
tions. In combination with Theorem 3, this will provide us with an effec- 
tive tool for solving systems of linear equations. 


Theorem 4. Every m X n matriz over the field F is row-equivalent to 
a row-reduced matrix. 


Proof. Let A be an m X n matrix over F. If every entry in the 
first row of A is 0, then condition (a) is satisfied in so far as row 1 is con- 
cerned. If row 1 has a non-zero entry, let k be the smallest positive integer 
j for which Ai; Æ 0. Multiply row 1 by Ax’, and then condition (a) is 
satisfied with regard to row 1. Now for each į > 2, add (—A x) times row 
1 to row 2. Now the leading non-zero entry of row 1 occurs in column k, 
that entry is 1, and every other entry in column k is 0. 

Now consider the matrix which has resulted from above. If every 
entry in row 2 is 0, we do nothing to row 2. If some entry in row 2 is dif- 
ferent from 0, we multiply row 2 by a scalar so that the leading non-zero 
entry is 1. In the event that row 1 had a leading non-zero entry in column 
k, this leading non-zero entry of row 2 cannot occur in column k; say it 
occurs in column k, Æ k. By adding suitable multiples of row 2 to the 
various rows, we can arrange that all entries in column k’ are 0, except 
the 1 in row 2. The important thing to notice is this: In carrying out these 
last operations, we will not change the entries of row 1 incolumns1,...,k; 
nor will we change any entry of column k. Of course, if row 1 was iden- 
tically 0, the operations with row 2 will not affect row 1. 

Working with one row at a time in the above manner, it is clear that 
in a finite number of steps we will arrive at a row-reduced matrix. J 


Exercises 


1. Find all solutions to the system of equations 


(1 — i)z = to = (] 
22, + (1 — t)t2 = 0. 


find all solutions of AX = 0 by row-reducing A. 
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6 —4 0 
A= 4 —2 0 
-1 0 3 
find all solutions of AX = 2X and all solutions of AX = 3X. (The symbol cX 


denotes the matrix each entry of which is c times the corresponding entry of X.) 


4. Find a row-reduced matrix which is row-equivalent to 


i —(1 +i) 0 
A=]1 —2 1}: 
1 2i —1 
5. Prove that the following two matrices are not row-equivalent: 
2 e 0 1 1 2 
a —l O} —2 0 ~I 
b c 3 1 3 5 
a b 
a i J 


be a 2 X 2 matrix with complex entries. Suppose that A is row-reduced and also 
that a + b + ¢ + d = 0. Prove that there are exactly three such matrices. 


6. Let 


7. Prove that the interchange of two rows of a matrix can be accomplished by a 
finite sequence of elementary row operations of the other two types. 


8. Consider the system of equations AX = 0 where 


ab 
is b ] 

is a 2 X 2 matrix over the field F. Prove the following. 

(a) If every entry of A is 0, then every pair (21, z2) is a solution of AX = 0. 

(b) If ad — be = 0, the system AX = 0 has only the trivial solution zı = 
T: = 0. 

(c) If ad — be = 0 and some entry of A is different from 0, then there is a 
solution (2?, 72) such that (zı, x2) is a solution if and only if there is a scalar y 
such that zı = yz}, z = yr? 
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1.4. Row-Reduced Echelon Matrices 


Until now, our work with systems of linear equations was motivated 
by an attempt to find the solutions of such a system. In Section 1.3 we 
established a standardized technique for finding these solutions. We wish 
now to acquire some information which is slightly more theoretical, and 
for that purpose it is convenient to go a little beyond row-reduced matrices. 


Definition. An m X n matrix R is called a row-reduced echelon 
matrix if: 
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(a) R ts row-reduced; 
(b) every row of R which has all its entries 0 occurs below every row 
which has a non-zero entry; 


(c) if rows 1,..., r are the non-zero rows of R, and if the leading non- 
zero entry of rew i occurs in column ki, i= 1,...,r, then ki < 
ko < +++ < kr. 


One can also describe an m X n row-reduced echelon matrix R as 
follows. Either every entry in R is 0, or there exists a positive integer r, 
1 < r < m, and r positive integers kı, . . . , kywith1 < k; S n and 

(a) Rij = 0 fori > T, and Rij =0 if 7 < ki. 

(b) Rir = 6;,1<icni<j<r. 

(e) ki < -e < ky. 


Examp.e 8. Two examples of row-reduced echelon matrices are the 
n X n identity matrix, and the m X n zero matrix 0”, in which all 
entries are 0. The reader should have no difficulty in making other ex- 
amples, but we should like to give one non-trivial one: 
01 -3 0 4 
0 0 0 1 2] 
0 0 0 0 0 


Theorem 5. Every m X n matrix A is row-equivalent to a row-reduced 
echelon matrix. 


Proof. We know that A is row-equivalent to a row-reduced 
matrix. All that we need observe is that by performing a finite number of 
row interchanges on a row-reduced matrix we can bring it to row-reduced 
echelon form. J 


In Examples 5 and 6, we saw the significance of row-reduced matrices 
in solving homogeneous systems of linear equations. Let us now discuss 
briefly the system RX = 0, when R is a row-reduced echelon matrix. Let 
rows 1,...,7 be the non-zero rows of R, and suppose that the leading 
non-zero entry of row 7 occurs in column k;. The system RX = 0 then 
consists of r non-trivial equations. Also the unknown z will occur (with 
non-zero coefficient) only in the ith equation. If we let w,.. . , Un- denote 
the (n — r) unknowns which are different from 2,,...,2%%,, then the 
r non-trivial equations in RX = 0 are of the form 


Ley + S Ciuj =0 
j=l 
(1-3) : : 
Lk, + 2 Crju; = 0. 
jr 
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All the solutions to the system of equations RX = 0 are obtained by 
assigning any values whatsoever to %,..., Un- and then computing the 
corresponding values of x,,..., 2%, from (1-3). For example, if R is the 
matrix displayed in Example 8, then r = 2, kı = 2, kı = 4, and the two 
non-trivial equations in the system RX = 0 are 


Ye — 3%3 +425 =0 or qt: = 323 — 425 
xa +245 =0 or x = —2z5. 


So we may assign any values to x, z} and zs, say Tı = Q, 13 = b, t5 = C, 
and obtain the solution (a, 3b — 3c, b, —2c, c). 

Let us observe one thing more in connection with the system of 
equations RX = 0. If the number r of non-zero rows in R is less than n, 
then the system RX = 0 has a non-trivial solution, that is, a solution 
(a1,...,2n) in which not every z; is 0. For, since r < n, we can choose 
some x; which is not among the r unknowns £r, . . . , Ze, and we can then 
construct a solution as above in which this z; is 1. This observation leads 
us to one of the most fundamental facts concerning systems of homoge- 
neous linear equations. 


Theorem 6. If A is an m X n matrix and m < n, then the homo- 
geneous system of linear equations AX = 0 has a non-trivial solution. 


Proof. Let R be a row-reduced echelon matrix which is row- 
equivalent to A. Then the systems AX = 0 and RX = 0 have the same 
solutions by Theorem 3. If r is the number of non-zero rows in R, then 
certainly r < m, and since m < n, we have r < n. It follows immediately 
from our remarks above that AX = 0 has a non-trivial solution. J 


Theorem 7. If Aisann X n (square) matrix, then A is row-equivalent 
to the n X n identity matrix if and only if the system of equations AX = 0 
has only the trivial solution. 


Proof. If A is row-equivalent to J, then AX = 0 and IX = 0 
have the same solutions. Conversely, suppose AX = 0 has only the trivial 
solution X = 0. Let R be an n X n row-reduced echelon matrix which is 
row-equivalent to A, and let r be the number of non-zero rows of R. Then 
RX = 0 has no non-trivial solution. Thus r > n. But since R has n rows, 
certainly r < n, and we have r = n. Since this means that R actually has 
a leading non-zero entry of 1 in each of its n rows, and since these 1’s 
occur each in a different one of the n columns, R must be then X n identity 
matrix. J 


Let us now ask what elementary row operations do toward solving 
a system of linear equations AX = Y which is not homogeneous. At the 
outset, one must observe one basic difference between this and the homo- 
geneous case, namely, that while the homogeneous system always has the 
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trivial solution zı = --: = £, = 0, an inhomogeneous system need have 
no solution at all. 

We form the augmented matrix A’ of the system AX = Y. This 
is the m X (n + 1) matrix whose first n columns are the columns of A 
and whose last column is Y. More precisely, 

w=Au, if jon 
Atn+y = Yi 

Suppose we perform a sequence of elementary row operations on A, 
arriving at a row-reduced echelon matrix R. If we perform this same 
sequence of row operations on the augmented matrix A’, we will arrive 
at a matrix R’ whose first n columns are the columns of R and whose last 
column contains certain scalars 2, ..., Zm. The scalars z; are the entries 
of the m X 1 matrix 


which results from applying the sequence of row operations to the matrix 
Y. It should be clear to the reader that, just as in the proof of Theorem 8, 
the systems AX = Y and RX = Z are equivalent and hence have the 
same solutions. It is very easy to determine whether the system RX = Z 
has any solutions and to determine all the solutions if any exist. For, if R 
has r non-zero rows, with the leading non-zero entry of row 7 occurring 


in column k;, i = 1,..., r, then the first r equations of RX = Z effec- 
tively express £e»... , e, in terms of the (n — r) remaining x; and the 
scalars 21, . . . , 2, The last (m — r) equations are 

0= ert 

0 = 2m 


and accordingly the condition for the system to have a solution is z; = 0 
for i > r. If this condition is satisfied, all solutions to the system are 
found just as in the homogeneous case, by assigning arbitrary values to 
(n — r) of the z; and then computing £+; from the ith equation. 


EXAMPLE 9. Let F be the field of rational numbers and 


1 —2 1 
A= {2 1 1 
0 5 -l 


and suppose that we wish to solve the system AX = Y for some yı, Yz, 
and y;. Let us perform a sequence of row operations on the augmented 
matrix A’ which row-reduces A: 
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1 —2 lip 1 —2 1 yı 
2 1 1 y|. ]|o 5 -1 (y -— 2y) | 
0 s 1 Y3 0 5 = 1 Ys 
1 —2 1 yı 1 —2 1 Yı 
0 5 -1 m=) |Bjo 1 = m- |. 
0 0 0 (ys — y2 + 2y1) 0 0 0 (ys — y: + 2y) 
10 $ $y + 2y) 
0 1 —} $y — 2y) 


0 0 O (ys — ye + 2y) 
The condition that the system AX = Y have a solution is thus 
241 — Yat ys =O 


and if the given scalars y; satisfy this condition, all solutions are obtained 
by assigning a value c to x; and then computing 


a= =e + emt 2y2) 
T2 ge + $(y2 — 2y). 
Let us observe one final thing about the system AX = Y. Suppose 
the entries of the matrix A and the scalars yı, . . . , Ym happen to lie in a 
subfield F; of the field F. If the system of equations AX = Y has a solu- 
tion with m,..., £a in F, it has a solution with zı, ..., £n in Fi. For, 
over either field, the condition for the system to have a solution is that 
certain relations hold between y,..., Ym in F, (the relations z; = 0 for 
i > r, above). For example, if AX = Y is a system of linear equations 
in which the scalars y and A,; are real numbers, and if there is a solution 
in which tı, ..., £a are complex numbers, then there is a solution with 
Tis... , Tn real numbers. 


Exercises 


1. Find all solutions to the following system of equations by row-reducing the 
coefficient matrix: 
iri + 2r: — 62; = 0 
—4x, + 52 =0 
—321 + 622 = 1323 =0 
— tn + 222 aa $23 =0 


2. Find a row-reduced echelon matrix which is row-equivalent to 


1 ~—7 
A=)}2 2 |. 
t 1+? 


What are the solutions of AX = 0? 
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3. Describe explicitly all 2 X 2 row-reduced echelon matrices. 
4. Consider the system of equations 
tı — zı + 22, = 1 
2x1 + 2x; = 1 
zı — 322 + 423 = 2. 
Does this system have a solution? If so, describe explicitly all solutions. 


5. Give an example of a system of two linear equations in two unknowns which 
has no solution. 


6. Show that the system 


zı — 2r + z3 + 2a, = 1 
Lit to t+ %=2 
zı + Tr: — 5x3 — nu = 3 


has no solution. 


7. Find all solutions of 


221 — 3x2 — 723 + 5x4 + 225 = —2 
zı — 2z: — 4z; + 844+ z; = —2 
221 — 4z; + 2u + «w= 3 
zı — 5x2 — 743 + 6z, + 2z; = —7. 


8. Let 


3 —l1 2 
A=]|2 1 1} 
1 -3 0 


For which triples (y1, Ya, y3) does the system AX = Y have a solution? 


9. Let 
3 —6 2 -1 
Al 2 4 1 3 | 
0 0 1 1 
1 —2 1 0 


For which (y1, y2, ys, ys) does the system of equations AX = Y have a solution? 


1@. Suppose R and R’ are 2 X 3 row-reduced echelon matrices and that the 
systems RX = Oand R'X = 0 have exactly the same solutions. Prove that R = R’. 


1.5. Matrix Multiplication 


It is apparent (or should be, at any rate) that the process of forming 
linear combinations of the rows of a matrix is a fundamental one. For this 
reason it is advantageous to introduce a systematic scheme for indicating 
just what operations are to be performed. More specifically, suppose B 


is ann X p matrix over a field F with rows fı, . . . , 8, and that from B we 
construct a matrix C with rows y,...,%m by forming certain linear 
combinations 


(1-4) Y: = AaBi + AB, +--+ + AinBn. 
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The rows of C are determined by the mn scalars A;; which are themselves 
the entries of an m X n matrix A. If (1-4) is expanded to 


(Cavs Cip) = 2 (AwBu- ++ AirBrp) 
we see that the entries of C are given by 
Cy = È AvBry 
Definition. Let A be an m X n matrix over the field F and let B be an 


n X p matrix over F. The product AB is the m X p matrix C whose i, j 


entry is 
n 


Cij = D AirByj. 
r=] 
ExamPLE 10. Here are some products of matrices with rational entries. 
is ER —] lee pllaee a 
f o 7 2) L-3 1JLI5 4 8 
Here 


m=(5 ~-1 2)=1-(5 —1 2)+0-(15 4 8) 
w=(0 7 2)= —3(5 ~1 2)+1-(15 4 8) 


0 6 1 1 0 

(b) 9 12 —8|_|-2 3 f 6 j 
12 62 —3 5 4|L3 8 -2 
3 8 -2 0 1 


Here 
y =(9 12 —8)= —2(0 6 1)+3(8 8 —2) 
y: = (12 62 —3)= 50 6 1) +43 8 -—2) 


s9 [a = E 4] [e] 
o BR gJ-H a 


Here 
v= (6 12) = 3(2 4) 


(e) [2 4j[~3] = no 
01 oli —5 2 2 
(f) 00 O}f2 38 4ļ|=ļ0 
00 Of/9 —1 3 0 


1 —5 210 1 0 
(g) 2 3 4/10 00 
9 -1 3}{10 0 0 


nore CoC w 
coef 
Li J} 
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It is important to observe that the product of two matrices need not 
be defined; the product is defined if and only if the number of columns in 
the first matrix coincides with the number of rows in the second matrix. 
Thus it is meaningless to interchange the order of the factors in (a), (b), 
and (c) above. Frequently we shall write products such as AB without 
explicitly mentioning the sizes of the factors and in such cases it will be 
understood that the product is defined. From (d), (e), (f), (g) we find that 
even when the products AB and BA are both defined it need not be true 
that AB = BA; in other words, matrix multiplication is not commutative. 


EXAMPLE 11. 


(a) If I is the m X m identity matrix and A is an m X n matrix, 
ĪA =A. 

(b) If J is the n X n identity matrix and A is an m X n matrix, 
AI =A. 

(c) If 0t» is the kX m zero matrix, 0%” = 0#™A. Similarly, 
A0”? = Q™?, 


EXAMPLE 12. Let A be an m X n matrix over F. Our earlier short- 
hand notation, AX = Y, for systems of linear equations is consistent 
with our definition of matrix products. For if 


Ln 


Ym, 
such that y: = Aga + Ant +--+ + Aindn. 
The use of column matrices suggests a notation which is frequently 


useful. If B is an n X p matrix, the columns of B are the 1 X n matrices 
Bı, . . . , Bp defined by 


B; = : , 1 <j <p. 


The matrix B is the succession of these columns: 
B = [Bı . . . , Bp]. 
The i, j entry of the product matrix AB is formed from the ith row of A 
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and the jth column of B. The reader should verify that the jth column of 
AB is AB;: 
AB = [AB,,..., ABp]. 
In spite of the fact that a product of matrices depends upon the 
order in which the factors are written, it is independent of the way in 
which they are associated, as the next theorem shows. 


Theorem 8. If A, B, C are matrices over the field F such that the prod- 
ucts BC and A(BC) are defined, then so are the products AB, (AB)C and 


A(BC) = (AB)C. 


Proof. Suppose B is an n X p matrix. Since BC is defined, C is 
a matrix with p rows, and BC has n rows. Because A(BC) is defined we 
may assume A is an m X n matrix. Thus the product AB exists and is an 
m X p matrix, from which it follows that the product (AB)C exists. To 
show that A(BC) = (AB)C means to show that 


[A(BC)].; = (AB)C],; 
for each 1, 7. By definition 
[A(BC)]i; = E Au(BC),; 


F 2 Awd BrCs; 
= È > A Braj 


= 2 2 A irBrCsj 


2 (> A irBra)C sj 


= [(AB)C]; J 


When A is an n X n (square) matrix, the product AA is defined. 
We shall denote this matrix by A?. By Theorem 8, (AA)A = A(AA) or 
A*A = AA?®, so that the product AAA is unambiguously defined. This 
product we denote by A*. In general, the product AA --- A (k times) is 
unambiguously defined, and we shall denote this product by A*. 

Note that the relation A(BC) = (AB)C implies among other things 
that linear combinations of linear combinations of the rows of C are again 
linear combinations of the rows of C. 

If B is a given matrix and C is obtained from B by means of an ele- 
mentary row operation, then each row of C is a linear combination of the 
rows of B, and hence there is a matrix A such that AB = C. In general 
there are many such matrices A, and among all such it is convenient and 
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possible to choose one having a number of special properties. Before going 
into this we need to introduce a class of matrices. 


Definition. An m X n matriz is said to be an elementary matrix if 
it can be obtained from the m X m identity matrix by means of a single ele- 
mentary row operation. 


EXAMPLE 13. A 2 X 2 elementary matrix is necessarily one of the 


following: 
k al F i È 1] 
1 of 0 1] e 1 
c 0 1 0 
Le tb c #0, li | c #0. 


Theorem 9. Let e be an elementary row operation and let E be the 
m X m elementary matrix E = e(I). Then, for every m X n matriz A, 


e(A) = EA. 


Proof. The point of the proof is that the entry in the ith row 
and jth column of the product matrix EA is obtained from the ith row of 
E and the jth column of A. The three types of elementary row operations 
should be taken up separately. We shall give a detailed proof for an oper- 
ation of type (ii). The other two cases are even easier to handle than this 
one and will be left as exercises. Suppose r # s and e is the operation 
‘replacement of row r by row r plus c times row s.’ Then 


on, TÆT 
Ork + COsk; 4 


Il 
= 


Therefore, 
Ain, a «er 


(EA); = 2 Erri = va Tn TSR 


In other words EA = e(A). ff 


Corollary. Let A and B be m X n matrices over the field F. Then B 
is row-equivalent to A if and only if B = PA, where P is a product of m X m 
elementary matrices. 


Proof. Suppose B = PA where P = E, --: EE: and the E; are 
mX m elementary matrices. Then £,A is row-equivalent to A, and 
E( EA) is row-equivalent to £,A. So EEA is row-equivalent to A; and 
continuing in this way we see that (E, --- E1)A is row-equivalent to A. 
Now suppose that B is row-equivalent to A. Let Hi, E» ..., E, be 
the elementary matrices corresponding to some sequence of elementary 
row operations which carries A into B. Then B = (E, EDA. J 


Sec, 1.6 Invertible Matrices 21 


Exercises 


1. Let 


Compute ABC and CAB. 
2. Let 


Verify directly that A (AB) = A2B. 
3. Find two different 2 X 2 matrices A such that A? = 0 but A = 0. 


4. For the matrix A of Exercise 2, find elementary matrices Er, Ho, ..., Ey 
such that 
Er ER EFA = I, 


1 4 
A -| 2} B= E At 
1 0 


Is there a matrix C such that CA = B? 


5. Let 


6. Let A be an m X n matrix and B an n X k matrix. Show that the columns of 
C = AB are linear combinations of the columns of A. If œ, . . . ,a, are the columns 
of A andj,..., Yx are the columns of C, then 


n 
Yi = = By ict. 
r=1 


7. Let A and Bbe2 X 2 matrices such that AB = J. Prove that BA = I. 


8. Let 
_ [Cu Ce 
C F [ë c] 
be a 2 X 2 matrix. We inquire when it is possible to find 2 X 2 matrices A and B 


such that C = AB — BA. Prove that such matrices can be found if and only if 
Cu + Cx = 0. 


1.6. Invertible Matrices 


Suppose P is an m X m matrix which is a product of elementary 
matrices. For each m X n matrix A, the matrix B = PA is row-equivalent 
to A; hence A is row-equivalent to B and there is a product Q of elemen- 
tary matrices such that A = QB. In particular this is true when A is the 
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m X m identity matrix. In other words, there is an m X m matrix Q, 
which is itself a product of elementary matrices. such that QP = I. As 
we shall soon see, the existence of a Q with QP = I is equivalent to the 
fact that P is a product of elementary matrices. 


Definition. Let A be an n X n (square) matrix over the field F. An 
n X n matrix B such that BA = I is called a left inverse of A; an n X n 
matrix B such that AB = I ts called a right inverse of A. If AB = BA = J, 
then B is called a two-sided inverse of A and A is said to be invertible. 


Lemma. If A has a left inverse B and a right inverse C, then B = C. 
Proof. Suppose BA = I and AC = I. Then 
B = BI = B(AC) = (BA)C =1C =C. | 
Thus if A has a left and a right inverse, A is invertible and has a 


unique two-sided inverse, which we shall denote by A~: and simply call 
the inverse of A. 


Theorem 10. Let A and B ben X n matrices over I’. 


(i) If A ts invertible, so is Am! and (A7!)~! = A. 
(ii) If both A and B are invertible, so is AB, and (AB)! = B“A7. 


Proof. The first statement is evident from the symmetry of the 
definition. The second follows upon verification of the relations 


(AB)(B1A~}) = (B—:A-) (AB) =I. J 
Corollary. A product of invertible matrices is invertible. 


Theorem 11. An elementary matrix is invertible. 


Proof. Let E be an elementary matrix corresponding to the 
elementary row operation e. If ei is the inverse operation of e (Theorem 2) 
and Eı = e,(1), then 


EE, = e(E:) = e(a(Z)) = 1 
and 
BE = elE) = ex(e(I)) =I 


so that E is invertible and E£, = E~. J 


EXAMPLE 14. y 
o [i o] =i o 
w a S a] 
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© ER 


(d) When c = 0, | 


6 T- ] = T= 2 
ol}; LO 1 0 ce] LO œ 


Theorem 12. If Ais ann X n matriz, the following are equivalent. 


(i) A ts invertible. 
(ii) A ts row-equivalent to the n X n identity matriz. 
(iii) A ts a product of elementary matrices. 


Proof. Let R be a row-reduced echelon matrix which is row- 
equivalent to A. By Theorem 9 (or its corollary), 


R= Er EBA 
where E, ..., Er are elementary matrices. Each E; is invertible, and so 
A = Er... E'R. 
Since products of invertible matrices are invertible, we see that A is in- 
vertible if and only if R is invertible. Since R is a (square) row-reduced 
echelon matrix, R is invertible if and only if each row of R contains a 
non-zero entry, that is, if and only if R = I. We have now shown that A 
is invertible if and only if R = J, and if R = J then A = Ep’. Ey". 


It should now be apparent that (i), (ii), and (iii) are equivalent statements 
about A. I 


Corollary. If A is an invertible n X n matrix and if a sequence of 
elementary row operations reduces A to the identity, then that same sequence 
of operations when applied to I yields A™!. 


Corollary. Let A and B be m X n matrices. Then B is row-equivalent 
to A if and only if B = PA where P is an invertible m X m matriz. 


Theorem 13. For an n X n matriz A, the following are equivalent. 


(i) A ts invertible. 
(ii) The homogeneous system AX = 0 has only the trivial solution 
xX = 0. 
(iii) The system of equations AX = Y has a solution X for each n X 1 
matrix Y. 


Proof. According to Theorem 7, condition (ii) is equivalent to 
the fact that A is row-equivalent to the identity matrix. By Theorem 12, 
(i) and (ii) are therefore equivalent. If A is invertible, the solution of 
AX = Y is X = A“'Y. Conversely, suppose AX = Y has a solution for 
each given Y. Let R be a row-reduced echelon matrix which is row- 
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equivalent to A. We wish to show that R = I. That amounts to showing 
that the last row of R is not (identically) 0. Let 


0 
0 


0 

1 

If the system RX = E can be solved for X, the last row of R cannot be 0. 
We know that R = PA, where P is invertible. Thus RX = E if and only 
if AX = P“ 2H. According to (iii), the latter system has a solution. J 


Corollary. A square matrix with either a left or right inverse is in- 
verteble. 


Proof. Let A be an n X n matrix. Suppose A has a left inverse, 
i.e., a matrix B such that BA = I. Then AX =0 has only the trivial 
solution, because X = IX = B(AX). Therefore A is invertible. On the 
other hand, suppose A has a right inverse, i.e., a matrix C such that 
AC = I. Then C has a left inverse and is therefore invertible. It then 
follows that A = C~! and so A is invertible with inverse C. J 


Corollary. Let A = AAs -++ Ax, where Ay..., Ax aren X n (square) 
matrices. Then A is invertible if and only if each Aj is invertible. 


Proof. We have already shown that the product of two invertible 
matrices is invertible. From this one sees easily that if each A; is invertible 
then A is invertible. 

Suppose now that A is invertible. We first prove that A, is in- 
vertible. Suppose X is an n X 1 matrix and A,X = 0. Then AX = 
(A; +++ Az_y)AzX = 0. Since A is invertible we must have X = 0. The 
system of equations A,X = 0 thus has no non-trivial solution, so A; is 
invertible. But now A; --+ Axı = AA,’ is invertible. By the preceding 
argument, A,-1 is invertible. Continuing in this way, we conclude that 
each A; isinvertible. J 


We should like to make one final comment about the solution of 
linear equations. Suppose A is an m X n matrix and we wish to solve the 
system of equations AX = Y. If R is a row-reduced echelon matrix which 
is row-equivalent to A, then R = PA where P is an m X m invertible 
matrix. The solutions of the system AX = Y are exactly the same as the 
solutions of the system RX = PY (= Z). In practice, it is not much more 
difficult to find the matrix P than it is to row-reduce A to R. For, suppose 
we form the augmented matrix A’ of the system AX = Y, with arbitrary 
scalars yı, . . . , Ym occurring in the last column. If we then perform on A’ 
a sequence of elementary row operations which leads from A to R, it will 
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become evident what the matrix P is. (The reader should refer to Ex- 
ample 9 where we essentially carried out this process.) In particular, if A 
is a square matrix, this process will make it clear whether or not A is 
invertible and if A is invertible what the inverse P is. Since we have 
already given the nucleus of one example of such a computation, we shall 
content ourselves with a 2 X 2 example. 


Examp_e 15. Suppose F is the field of rational numbers and 
2 -1 
asfi a} 


3 abr 3 YW |= 
—l yı 0 —7 Yı — 2Y2 


f 3 Yo 12 0 | 
01 (2y: — yı) 0 1 2: — y) 


from which it is clear that A is invertible and 


-3i 


It may seem cumbersome to continue writing the arbitrary scalars 
Yı, Yz . . . in the computation of inverses. Some people find it less awkward 
to carry along two sequences of matrices, one describing the reduction of 
A to the identity and the other recording the effect of the same sequence 
of operations starting from the identity. The reader may judge for him- 
self which is a neater form of bookkeeping. 


Then 


[i 3 nl 
1 3y 2 


nj jeo 
ato ajm 


ExampLe 16. Lct us find the inverse of 


133 
A=|3 3 3 
1 1 1 
3 4 5 
1 3 4 1 0 0 
Eo 24 
3 3 6 0 0 1 
1 4 4 1 0 0 
fo Ys | | 3 1 | 
On 6 -3 0 1 
ie. a 1 0 0 
fo d h | = 1 0 
0 0 rw. a >l 1 
Kee a 1 0o 0 
f 1 1} —6 12 | 
0 0 1 30 —180 180 
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—9 60 —60 

, —36 192 —180 

30 —180 180 

—36 30 
, —36 192 —180 |} 

30 —180 180 


D0. oo| 

O.O OF 8 

Foo roe 
© 


It must have occurred to the reader that we have carried on a lengthy 
discussion of the rows of matrices and have said little about the columns. 
We focused our attention on the rows because this seemed more natural 
from the point of view of linear equations. Since there is obviously nothing 
sacred about rows, the discussion in the last sections could have been 
carried on using columns rather than rows. If one defines an elementary 
column operation and column-equivalence in a manner analogous to that 
of elementary row operation and row-equivalence, it is clear that each 
m X n matrix will be column-equivalent to a ‘column-reduced echelon’ 
matrix. Also each elementary column operation will be of the form 
A > AE, where E is an n X n elementary matrix—and so on. 


Exercises 
1. Let 


1 2 1 0 
A=|-1 0 38 5} 
1-2 1 1 


Find a row-reduced echelon matrix R which is row-equivalent to A and an in- 
vertible 3 X 3 matrix P such that R = PA, 


2. Do Exercise 1, but with 
2 e a 
A= Í —3 =| 
t 1 1 


3. For each of the two matrices 


2 5 —1 1 —1 2 
4 ~l 2j 3 2 4 
6 4 1 0 1 -2 
use elementary row operations to discover whether it is invertible, and to find the 


inverse in case it is. 


4. Let 
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For which X does there exist a scalar c such that AX = cX? 


5. Discover whether 


ocor 
cocW WN 
Ouuu 
A AAA 


is invertible, and find A~! if it exists. 


6. Suppose A isa 2 X 1 matrix and that Bisa 1 X 2 matrix. Prove that C = AB 
is not invertible. 


7. Let A be an n X n (square) matrix. Prove the following two statements: 
(a) If A is invertible and AB = 0 for some n X n matrix B, then B = 0. 
(b) If A is not invertible, then there exists an n X n matrix B such that 


AB = 0 but B # 0. 
a b 
asi | 


8. Let 
Prove, using elementary row operations, that A is invertible if and only if 
(ad — bc) ¥ 0. 


9. An n X n matrix A is called upper-triangular if A;; = 0 for? > j, that is, 
if every entry below the main diagonal is 0. Prove that an upper-triangular (square) 
matrix is invertible if and only if every entry on its main diagonal is different 
from 0. 

10. Prove the following generalization of Exercise 6. If A is an m X n matrix, 
B isan n X m matrix and n < m, then AB is not invertible. 

1l. Let A be an m X n matrix. Show that by means of a finite number of elemen- 
tary row and/or column operations one can pass from A to a matrix R which 
is both ‘row-reduced echelon’ and ‘column-reduced echelon,’ i.e., Ri; = 0 if i ¥ j, 
Ra = 1,1 <t<7, Ry =0 if i >r. Show that R = PAQ, where P is an in- 
vertible m X m matrix and Q is an invertible n X n matrix. 


12. The result of Example 16 suggests that perhaps the matrix 





is invertible and A~! has integer entries. Can you prove that? 


2. Vector Spaces 


2.1. Vector Spaces 


In various parts of mathematics, one is confronted with a set, such 
that it is both meaningful and interesting to deal with ‘linear combina- 
tions’ of the objects in that set. For example, in our study of linear equa- 
tions we found it quite natural to consider linear combinations of the 
rows of a matrix. It is likely that the reader has studied calculus and has 
dealt there with linear combinations of functions; certainly this is so if 
he has studied differential equations. Perhaps the reader has had some 
experience with vectors in three-dimensional Euclidean space, and in 
particular, with linear combinations of such vectors. 

Loosely speaking, linear algebra is that branch of mathematics which 
treats the common properties of algebraic systems which consist of a set, 
together with a reasonable notion of a ‘linear combination’ of elements 
in the set. In this section we shall define the mathematical object which 
experience has shown to be the most useful abstraction of this type of 
algebraic system. 


Definition. A vector space (or linear space) consists of the following: 


1. a field F of scalars; 

2. a set V of objects, called vectors; 

3. a rule (or operation), called vector addition, which associates with 
each pair of vectors a, B in V a vector a + Bin V, called the sum of « and B, 
in such a way that 

(a) addition is commutative, a +B = B + a; 
(b) addition is associative, a + (B8 + y) = (a +86) + y; 
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(c) there is a unique vector 0 in V, called the zero vector, such that 
a+0 =aforallainV; 

(d) for each vector a in V there is a unique vector —a in V such that 
a+ (—a) = 0; 

4. a rule (or operation), called scalar multiplication, which associates 
with each scalar c in F and vector a in V a vector ca in V, called the product 
of c and a, in such a way that 

(a) la = a for every a in V; 
(b) (C1c2)a = c1(ca); 

(c) e(a + B) = ca + c£; 
(d) (c1 + coja = cra + cza. 


It is important to observe, as the definition states, that a vector 
space is a composite object consisting of a field, a set of ‘vectors, and 
two operations with certain special properties. The same set of vectors 
may be part of a number of distinct vector spaces (see Example 5 below). 
When there is no chance of confusion, we may simply refer to the vector 
space as V, or when it is desirable to specify the field, we shall say V is 
a vector space over the field F. The name ‘vector’ is applied to the 
elements of the set V largely as a matter of convenience. The origin of 
the name is to be found in Example 1 below, but one should not attach 
too much significance to the name, since the variety of objects occurring 
as the vectors in V may not bear much resemblance to any preassigned 
concept of vector which the reader has. We shall try to indicate this 
variety by a list of examples; our list will be enlarged considerably as we 
begin to study vector spaces. 


ExamPrLeE 1. The n-tuple space, F”. Let F be any field, and let V be 


the set of all n-tuples a = (a, £2, . . ., Zn) of scalars x; in F. If 8 = 
(Yt, Ya» + +) Yn) with y: in F, the sum of a and £ is defined by 
(2-1) at B= (tı + Yn Te + Yz., En F Yn). 


The product of a scalar c and vector a is defined by 
(2-2) Ca = (CX, 68a... , CEn). 


The fact that this vector addition and scalar multiplication satisfy con- 
ditions (3) and (4) is easy to verify, using the similar properties of addi- 
tion and multiplication of elements of F. 


EXAMPLE 2. The space of m Xn matrices, F». Let F be any 
field and let m and n be positive integers. Let F”*" be the set of all m X n 
matrices over the field F. The sum of two vectors A and B in F”** is de- 
fined by 


(2-3) (A + B)y = Ay + By. 
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The product of a scalar c and the matrix A is defined by 
(2-4) (cA) = CAg;. 
Note that FX = F”, 


EXAMPLE 3. The space of functions from a set to a field. Let F be 
any field and let S be any non-empty set. Let V be the set of all functions 
from the set S into F. The sum of two vectors f and g in V is the vector 
f +4, i.e., the function from S into F, defined by 


(2-5) (F + g)(s) = f(s) + g(s). 
The product of the scalar c and the function f is the function cf defined by 
(2-6) (f)(s) = ef(s). 


The preceding examples are special cases of this one. For an n-tuple of 
elements of F may be regarded as a function from the set S of integers 
1,..., n into F. Similarly, an m X n matrix over the field F is a function 
from the set S of pairs of integers, (i,j), 1 <i < m, 1 < j < n, into the 
field F. For this third example we shall indicate how one verifies that the 
operations we have defined satisfy conditions (3) and (4). For vector 
addition: 


(a) Since addition in F is commutative, 
F(s) + g(s) = g(s) + f(s) 
for each s in S, so the functions f + g and g + f are identical. 
(b) Since addition in F is associative, 
f(s) + [9(s) + h(s)] = [f(s) + g(s)] + h(s) 


for each s, so f + (g +h) is the same function as (f + g) + h. 

(c) The unique zero vector is the zero function which assigns to each 
element of S the scalar 0 in F. 

(d) For each f in V, (—f) is the function which is given by 


(—f)(s) = —f(s). 
The reader should find it easy to verify that scalar multiplication 
satisfies the conditions of (4), by arguing as we did with the vector addition. 


EXAMPLE 4. The space of polynomial functions over a field F. 
Let F be a field and let V be the set of all functions f from F into F which 
have arule of the form 


(2-7) f(x) = co + ert + +++ + cna” 

where co, C1,.+-, Cn are fixed scalars in F (independent of x). A func- 
tion of this type is called a polynomial function on F. Let addition 
and scalar multiplication be defined as in Example 3. One must observe 
here that if f and g are polynomial functions and c is in F, then f + g and 
cf are again polynomial functions. 
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ExampeE 5. The field C of complex numbers may be regarded as a 
vector space over the field R of real numbers. More generally, let F be the 
field of real numbers and let V be the set of n-tuples a = (21,..., Zn) 
where t ..., £n are complex numbers. Define addition of vectors and 
scalar multiplication by (2-1) and (2-2), as in Example 1. In this way we 
obtain a vector space over the field R which is quite different from the 
space C” and the space R”. 


There are a few simple facts which follow almost immediately from 
the definition of a vector space, and we proceed to derive these. If c is 
a scalar and 0 is the zero vector, then by 3(c) and 4(c) 


c0 = c(0 + 0) = c0 + c0. 
Adding — (c0) and using 3(d), we obtain 


(2-8) c0 = 0. 
Similarly, for the scalar 0 and any vector @ we find that 
(2-9) 0a = 0. 


If cis a non-zero scalar and a is a vector such that ca = 0, then by (2-8), 
c (ca) = 0. But 


tca) = (eja = la = a 


hence, a = 0. Thus we see that if c is a scalar and «æ a vector such that 
ca = 0, then either c is the zero scalar or «æ is the zero vector. 
If a is any vector in V, then 


0 = 0a = (1 — Ne = la + (—1)a = a + (—1)a 
from which it follows that 
(2-10) (—le = —a. 


Finally, the associative and commutative properties of vector addition 
imply that a sum involving a number of vectors is independent of the way 
in which these vectors are combined and associated. For example, if 
ai Oz, Q3, as are vectors in V, then 


(a1 + a) + (œs + a) = [a + (a + a)] + as 
and such a sum may be written without confusion as 


ay + az + a3 + a4. 


Definition. A vector B in V is said to be a linear combination of the 


vectors ay,..., Qn in V provided there exist scalars G,..., Cn in F such that 
B = Cia, + OS + Cnn 
n 
al 
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Other extensions of the associative property of vector addition and 
the distributive properties 4(c) and 4(d) of scalar multiplication apply 
to linear combinations: 

N 


z cart $ dai = È (& + ddas 
i= i=1 i=1 


i= 
n n 

c D caæai= D (cci)ai 
i=l i=l 


Certain parts of linear algebra are intimately related to geometry. 
The very word ‘space’ suggests something geometrical, as does the word 
‘vector’ to most people. As we proceed with our study of vector spaces, 
the reader will observe that much of the terminology has a geometrical 
connotation. Before concluding this introductory section on vector spaces, 
we shall consider the relation of vector spaces to geometry to an extent 
which will at least indicate the origin of the name ‘vector space.’ This 
will be a brief intuitive discussion. 

Let us consider the vector space R3. In analytic geometry, one iden- 
tifies triples (21, x2, 3) of real numbers with the points in three-dimensional 
Euclidean space. In that context, a vector is usually defined as a directed 
line segment PQ, from a point P in the space to another point Q. This 
amounts to a careful formulation of the idea of the ‘arrow’ from P to Q. 
As vectors are used, it is intended that they should be determined by 
their length and direction. Thus one must identify two directed line seg- 
ments if they have the same length and the same direction. 

The directed line segment PQ, from the point P = (a1, 22, x3) to the 
point Q = (yı, Yz ys), has the same length and direction as the directed 
line segment from the origin O = (0,0, 0) to the point (yı — 21, Y2 — 22, 
Ys — 23). Furthermore, this is the only segment emanating from the origin 
which has the same length and direction as PQ. Thus, if one agrees to 
treat only vectors which emanate from the origin, there is exactly one 
vector associated with each given length and direction. 

The vector OP, from the origin to P = (x, 2, z3), is completely de- 
termined by P, and it is therefore possible to identify this vector with the 
point P. In our definition of the vector space R?, the vectors are simply 
defined to be the triples (21, £2, £3). 

Given points P = (a, 22,23) and Q = (Yı, Y2, yz), the definition of 
the sum of the vectors OP and OQ can be given geometrically. If the 
vectors are not parallel, then the segments OP and OQ determine a plane 
and these segments are two of the edges of a parallelogram in that plane 
(see Figure 1). One diagonal of this parallelogram extends from O to a 
point S, and the sum of OP and OQ is defined to be the vector OS. The 
coordinates of the point S are (xı + Yı, %2 + Yz, x3 + ys) and hence this 
geometrical definition of vector addition is equivalent to the algebraic 
definition of Example 1. 
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SO, + Yi, X2 + Yz Xa + Ys) 


P (Xi X2, Xa) 





Qir Yz Ya) 


FIGURE 1 


Scalar multiplication has a simpler geometric interpretation. If c is 
a real number, then the product of c and the vector OP is the vector from 
the origin with length |c| times the length of OP and a direction which 
agrees with the direction of OP if c > 0, and which is opposite to the 
direction of OP if c < 0. This scalar multiplication just yields the vector 
OT where T = (cx, C£ cvs), and is therefore consistent with the algebraic 
definition given for Rè. 

From time to time, the reader will probably find it helpful to ‘think 
geometrically’ about vector spaces, that is, to draw pictures for his own 
benefit to illustrate and motivate some of the ideas. Indeed, he should do 
this. However, in forming such illustrations he must bear in mind that, 
because we are dealing with vector spaces as algebraic systems, all proofs 
we give will be of an algebraic nature. 


Exercises 
1. If F isa field, verify that F” (as defined in Example 1) is a vector space over 
the field F. 
2. If V is a vector space over the field F, verify that 
(ai + ate) + (as + as) = [exe + las + a)] + ai 
for all vectors a4, œz, œz, and ay in V. 


3. If C is the field of complex numbers, which vectors in C? are linear combina- 
tions of (1,0, — 1), (0, 1, 1), and (1, 1, 1)? 
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4. Let V be the set of all pairs (z, y) of real numbers, and let F be the field of 
real numbers. Define 


(t,y) + (yy) = (£ + t, y +m) 
c(x, y) = (cz, y). 
Is V, with these operations, a vector space over the field of real numbers? 


5. On R”, define two operations 
al =a -$ 
c:a=—ca, 
The operations on the right are the usual ones. Which of the axioms for a vector 
space are satisfied by (R”, @, -)? 


6. Let V be the set of all complex-valued functions f on the real line such that 
(for all ¢ in R) DE 
f(-t) = sÀ. 


The bar denotes complex conjugation. Show that V, with the operations 


GDO =K + g(t) 
(cf)(t) = fA 
is a vector space over the field of real numbers. Give an example of a function in V 
which is not real-valued. 


7. Let V be the set of pairs (2, y) of real numbers and let F be the field of real 
numbers. Define 


(x,y) + (a, yn) = (x + m1, 0) 
e(z, y) = (cz, 0). 


Is V, with these operations, a vector space? 


2.2. Subspaces 


In this section we shall introduce some of the basic concepts in the 
study of vector spaces. 


Definition. Let V be a vector space over the field F. A subspace of V 
is a subset W of V which is itself a vector space over F with the operations of 
vector addition and scalar multiplication on V. 


A direct check of the axioms for a vector space shows that the subset 
W of V is a subspace if for each a and 8 in W the vector a + £ is again 
in W; the 0 vector is in W; for each a in W the vector (—a) is in W; for 
each a in W and each scalar c the vector ca is in W. The commutativity 
and associativity of vector addition, and the properties (4)(a), (b), (c), 
and (d) of scalar multiplication do not need to be checked, since these 
are properties of the operations on V. One can simplify things still further. 
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Theorem 1. A non-empty subset W of V is a subspace of V if and only 
af for each pair of vectors a, B in W and each scalar c in F the vector ca + B 
is again in W. 

Proof. Suppose that W is a non-empty subset of V such that 
ce + 6 belongs to W for all vectors a, 8 in W and all scalars c in F. Since 
W is non-empty, there is a vector p in W, and hence (—1)p + p = O is 
in W. Then if a is any vector in W and c any scalar, the vector ca = ca + 0 
is in W. In particular, (—l)a = —aisin W. Finally, if a and £ are in W, 
then a + 8 = la + Bis in W. Thus W is a subspace of V. 

Conversely, if W is a subspace of V, a and £ are in W, and cis a scalar, 
certainly ca + Bisin W. J 


Some people prefer to use the ca + 8 property in Theorem 1 as the 
definition of a subspace. It makes little difference. The important point 
is that, if W is a non-empty subset of V such that ca + 8 isin V for alla, 
Bin W and all c in F, then (with the operations inherited from V) W isa 
vector space. This provides us with many new examples of vector spaces. 


EXAMPLE 6. 

(a) If V is any vector space, V is a subspace of V; the subset con- 
sisting of the zero vector alone is a subspace of V, called the zero sub- 
space of V. 

(b) In F”, the set of n-tuples (21,...,%n) with xı = Ois a subspace; 
however, the set of n-tuples with xı = 1 + zz is not a subspace (n > 2). 

(c) The space of polynomial functions over the field F is a subspace 
of the space of all functions from F into F. 

(d) An n X n (square) matrix A over the field F is symmetric if 
Ai = Aj; for each 7 and j. The symmetric matrices form a subspace of 
the space of all n X n matrices over F. 

(e) An n Xn (square) matrix A over the field C of complex num- 
bers is Hermitian (or self-adjoint) if 


Ajr = Aa; 


for each j, k, the bar denoting complex conjugation. A 2 X 2 matrix is 
Hermitian if and only if it has the form 


[ z | 
xz — iy w 


where 2, y, 2, and w are real numbers. The set of all Hermitian matrices 
is not a subspace of the space of all n X n matrices over C. For if A is 
Hermitian, its diagonal entries Ay, Av, . . . , are all real numbers, but the 
diagonal entries of 7A are in general not real. On the other hand, it is easily 
verified that the set of n X n complex Hermitian matrices is a vector 
space over the field R of real numbers (with the usual operations). 
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Examrue 7. The solution space of a system of homogeneous 
linear equations. Let A be an m X n matrix over F. Then the set of all 
n X 1 (column) matrices X over F such that AX = 0 is a subspace of the 
space of all n X 1 matrices over F. To prove this we must show that 
A(cX + Y) = 0 when AX = 0, AY = 0, and cis an arbitrary scalar in F. 
This follows immediately from the following general fact. 


Lemma. If Aisanm X n matriz over F and B, C aren X p matrices 
over F then 
(2-11) A(dB + C) = d(AB) + AC 
for each scalar d in F. 
Proof. [AGB + C)].ij = Z Au(dB + O)r; 


= > (dA wBu; + A uCe;) 
k 
=d z ABr; + z Aiklrj 


= d(AB)y + (AC); 
= [d AB) + AC]. J 


Similarly one can show that (dB + C)A = d(BA) + CA, if the 
matrix sums and products are defined. 


Theorem 2. Let V be a vector space over the field F. The intersection 
of any collection of subspaces of V is a subspace of V. 


Proof. Let {W.} be a collection of subspaces of V, and let W = 
() W, be their intersection. Recall that W is defined as the set of all ele- 


ments belonging to every W, (see Appendix). Since each W, is a subspace, 
each contains the zero vector. Thus the zero vector is in the intersection 
W, and W is non-empty. Let a and 8 be vectors in W and let c be a scalar. 
By definition of W, both a and £8 belong to each W,, and because each Wa 
is a subspace, the vector (ca + 8) is in every Wa. Thus (ca + £8) is again 
in W. By Theorem 1, W isa subspace of V. J 


From Theorem 2 it follows that if S is any collection of vectors in V, 
then there is a smallest subspace of V which contains S, that is, a sub- 
space which contains S and which is contained in every other subspace 
containing S. 


Definition. Let S be a set of vectors in a vector space V. The subspace 
spanned by S is defined to be the intersection W of all subspaces of V which 
contain S. When S is a finite set of vectors, S = {ay, a, ..., Qn}, we shall 
simply call W the subspace spanned by the vectors ai, @g, . . . Qn: 
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Theorem 3. The subspace spanned by a non-empty subset S of a vector 
space V is the set of all linear combinations of vectors in 8. 


Proof. Let W be the subspace spanned by S. Then each linear 
combination 
a = tia + Le. + +++ + Emam 


of vectors a, a,..., Gm in S is clearly in W. Thus W contains the set L 
of all linear combinations of vectors in S. The set L, on the other hand, 
contains S and is non-empty. If a, 8 belong to L then a is a linear 
combination, 


a = Tay + Loaq + +++ + Emam 


of vectors a; in S, and £ is a linear combination, 


B = Yb + Y2Be + +++ + YnBr 


of vectors 6; in S. For each scalar c, 
ca + B = 2 (cx:i)ai + 2 Yißi. 
i= j= 


Hence ca + 8 belongs to L. Thus L is a subspace of V. 

Now we have shown that L is a subspace of V which contains S, and 
also that any subspace which contains S contains L. It follows that L is 
the intersection of all subspaces containing S, i.e., that L is the subspace 
spanned by the set S. Jj 


Definition. If S, Sz, ..., Sx are subsets of a vector space V, the set of 
all sums 


a tag tts + ax 


of vectors a; in S; is called the sum of the subsets Sı, S2, . . . , Sk and ts de- 
noted by 
Si + Se +--+ + 8x 
or by 
k 
D> Si. 
i=l 


If Wi, Wa, . . ., Wi are subspaces of V, then the sum 
W=WtW t. + Wi 


is easily seen to be a subspace of V which contains each of the subspaces 
W,. From this it follows, as in the proof of Theorem 3, that W is the sub- 
space spanried by the union of Wi, We,..., We. 


Examete 8. Let F be a subfield of the field C of complex numbers. 
Suppose 
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By Theorem 3, a vector a is in the subspace W of F* spanned by aa, a2, a 
if and only if there exist scalars cı, ce, c; in F such that 


Qa = CQ) + CoQ + C3036 
Thus W consists of all vectors of the form 
a = (C1, 261, C2, 3C, + 4C2, C3) 


where cı, C2, ¢3 are arbitrary scalars in F. Alternatively, W can be described 
as the set of all 5-tuples 


a= (x1, To, Las Lay 25) 
with x; in F such that 


T = 221 
Ta = 321 + 423. 


Thus (—3, —6, 1, —5, 2) is in W, whereas (2, 4, 6, 7, 8) is not. 
EXAMPLE 9. Let F be a subfield of the field C of complex numbers, 


and let V be the vector space of all 2 X 2 matrices over F. Let W, be the 
subset of V consisting of all matrices of the form 


Le o] 

z 0 

where 2, y, z are arbitrary scalars in F. Finally, let Wz be the subset of V 
consisting of all matrices of the form 


[e 4] 


where x and y are arbitrary scalars in F. Then W, and W: are subspaces 
of V. Also 
V=W,+ W, 


a b a b 0 0 
f 2 7 f | au lo al 
The subspace W, () W: consists of all matrices of the form 
Lo of 
0 0 
Examp.e 10. Let A be an m X n matrix over a field F. The row 


vectors of A are the vectors in F” given by a; = (Au,..., Aim), T= 1,..., 
m. The subspace of F” spanned by the row vectors of A is called the row 


because 
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space of A. The subspace considered in Example 8 is the row space of the 
matrix 


120 3 0 
A=|0 0 1 4 OF} 

0000 1 

It is also the row space of the matrix 
1 2 0 3 0 
0 01 4 0 
B=| ò o0 oaf 

—4 —8 1 —8 0 


ExamPLE 11. Let V be the space of all polynomial functions over F. 
Let S be the subset of V consisting of the polynomial functions fo, fi, fo, . . - 
defined by 


file) =a", n=0,1,2,.... 
Then V is the subspace spanned by the set S. 


Exercises 


1. Which of the following sets of vectors œ = (a, .. ., an) in R” are subspaces 
of R” (n > 3)? 
(a) all œ such that a, > 0; 
(b) alla such that a, + 3a: = 43; 
(c) all æ such that a = a?; 
(d) all æ such that aiaz = 0; 
(e) alla such that az is rational. 


2. Let V be the (real) vector space of all functions f from R into R. Which of the 
following sets of functions are subspaces of V? 


(a) all f such that f(x?) = f(x)?; 

(b) all f such that f(0) = f(1); 

(c) all f such that f(3) = 1 + f(—5); 
(d) all f such that f(—1) = 0; 

(e) all f which are continuous. 


3. Is the vector (3, —1, 0, —1) in the subspace of Rë spanned by the vectors 
(2, l, 3, 2), (—1, 1, 1, —3), and G, 1, 9, —5)? 


4, Let W be the set of all (x1, £2, £a, Ta, T5) in R5 which satisfy 
20, — t+ 423 — T4 =0 
Pat + 323 — rt =0 
92x, — 322 + 623 — 324 — 325 = 0. 


Find a finite set of vectors which spans W. 
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5. Let F be a field and let n be a positive integer (n > 2). Let V be the vector 
space of all n X n matrices over F. Which of the following sets of matrices A in V 
are subspaces of V? 


(a) all invertible A; 

(b) all non-invertible A; 

(c) all A such that A B = BA, where B is some fixed matrix in V; 
(d) all A such that A? = A. 


6. (a) Prove that the only subspaces of R! are R! and the zero subspace. 

(b) Prove that a subspace of R? is R?, or the zero subspace, or consists of all 
scalar multiples of some fixed vector in R2, (The last type of subspace is, intuitively, 
a straight line through the origin.) 

(c) Can you describe the subspaces of R*? 


7. Let W, and W: be subspaces of a vector space V such that the set-theoretic 
union of W; and W: is also a subspace. Prove that one of the spaces W; is contained 
in the other. 


8. Let V be the vector space of all functions from R into R; let V. be the 
subset of even functions, f(—2x) = f(x); let Vo be the subset of odd functions, 


f(—2) = —f(2). 


(a) Prove that V, and Vo are subspaces of V. 
(b) Prove that V. + Vo = V. 
(c) Prove that V. O Vo = {0}. 


9. Let Wı and W, be subspaces of a vector space V such that W, + W: = V 
and Wi) W: = {0}. Prove that for each vector a in V there are unique vectors 
a in W, and az in W: such that a = a + a2. 


2.3. Bases and Dimension 


We turn now to the task of assigning a dimension to certain vector 
spaces. Although we usually associate ‘dimension’ with something geomet- 
rical, we must find a suitable algebraic definition of the dimension of a 
vector space. This will be done through the concept of a basis for the space. 


Definition. Let V be a vector space over F. A subset S of V is said to 
be linearly dependent (er simply, dependent) if there exist distinct vectors 
an OQ, ..., Qn IN S and scalars Cu C2,..., Cn in F, not all of which are 0, 
such that 


C101 + Cag ++ +++ + Caan = O. 
A set which is not linearly dependent is called linearly independent. If 
the set S contains only finitely many vectors air, œs, ..., Gn, We sometimes say 


that ay, 2, ..., an are dependent (or independent) instead of saying S is 
dependent (or independent). 
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The following are easy consequences of the definition. 


1, Any set which contains a linearly dependent set is linearly de- 
pendent. 

2. Any subset of a linearly independent set is linearly independent. 

3. Any set which contains the 0 vector is linearly dependent; for 
1-0=0. 

4, A set S of vectors is linearly independent if and only if each finite 
subset of S is linearly independent, i.e., if and only if for any distinct 
vectors a@,..., Qn of S, ca, + +++ + Cran = 0 implies each c; = 0. 


Definition. Let V be a vector space. A basis for V is a linearly inde- 
pendent set of vectors in V which spans the space V. The space V 1s finite- 
dimensional if tt has a finite basis. 


EXAMPLE 12. Let F be a subfield of the complex numbers. In F? the 
vectors 


a =( 3,0, —8) 
ae = (-1, 1, 2) 
a3 =( 4,2, —2) 
am=( 2,1, 1) 


are linearly dependent, since 


2a; + 2a2 — as + 0 - a = 0. 
The vectors 


& = (1, 0, 0) 
E&R = (0, 1, 0) 
B= (0, 0, 1) 


are linearly independent 


ExamrLE 13. Let F be a field and in F” let S be the subset consisting 


of the vectors a, € . . . , €n defined by 

a = (1,0,0,..., 0) 

eg = (0,1,0,..., 0) 

€n = (0,0,0,..., 1). 
Let 2, 2,..., £n be scalars in F and put æ = xe, + Tze + +++ + Znen- 
Then 
(2-12) C= (a1, Vays oy Xn). 
This shows that a,...,€, span F”. Since a = 0 if and only if 1 = 
Tz = +++ = Zn = 0, the vectors q,...,€, are linearly independent. The 
set S = {e,..., €n} is accordingly a basis for F”, We shall call this par- 


ticular basis the standard basis of F”. 
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EXAMPLE 14. Let P be an invertible n X n matrix with entries in 
the field F. Then Pi,..., Pa, the columns of P, form a basis for the space 
of column matrices, F”*1, We see that as follows. If X is a column matrix, 
then 

PX = nP, + sae + t,Pn. 


Since PX = 0 has only the trivial solution X = 0, it follows that 
{P,,..., Pa} is a linearly independent set. Why does it span F”*!? Let Y 
be any column matrix. If X = P-'Y, then Y = PX, that is, 


Y = aPi + TES + taPrn. 
So {Pi,..., Pa} is a basis for F"*1, 


EXAMPLE 15. Let A be an m X n matrix and let S be the solution 
space for the homogeneous system AX = 0 (Example 7). Let R be a row- 
reduced echelon matrix which is row-equivalent to A. Then S is also the 
solution space for the system RX = 0. If R has r non-zero rows, then the 
system of equations RX = 0 simply expresses r of the unknowns 21, . . . , Xn 
in terms of the remaining (n — r) unknowns z;. Suppose that the leading 
non-zero entries of the non-zero rows occur in columns ky,..., kp Let J 
be the set consisting of the n — r indices different from ky, ..., kr: 


J = {1,... n} — {ky ..., kh. 
The system RX = 0 has the form 
Tk + 2 Cyjtj = 0 


Tk, + 2 Criljp = 0 


where the ¢;; are certain scalars. All solutions are obtained by assigning 
(arbitrary) values to those z,’s with j in J and computing the correspond- 
ing values of x,,..., vz, For each j in J, let E; be the solution obtained 
by setting z; = 1 and z; = 0 for all other îi in J. We assert that the (n — r) 
vectors E; j in J, form a basis for the solution space. 

Since the column matrix E; has a 1 in row j and zeros in the rows 
indexed by other elements of J, the reasoning of Example 13 shows us 
that the set of these vectors is linearly independent. That set spans the 
solution space, for this reason. If the column matrix T, with entries 
hh, ..., İn is in the solution space, the matrix 


N = È 4E; 
J 
is also in the solution space and is a solution such that z; = t; for each 


jin J. The solution with that property is unique; hence, N = T and T is 
in the span of the vectors E;. 
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EXAMPLE 16. We shall now give an example of an infinite basis. Let 
F be a subfield of the complex numbers and let V be the space of poly- 
nomial functions over F. Recall that these functions are the functions 
from F into F which have a rule of the form 


f(x) = co tem + +++ + cnr”. 


Let f,(z) = a, k = 0,1, 2,.... The (infinite) set {fo, fi, fo, . . .} is a basis 
for V. Clearly the set spans V, because the function f (above) is 


f = cofo + afi + +++ + Cafn: 


The reader should see that this is virtually a repetition of the definition 
of polynomial function, that is, a function f from F into F is a polynomial 
function if and only if there exists an integer n and scalars co, . . . , Cn such 
that f = cofo + --- + cafa: Why are the functions independent? To show 
that the set {fo, fis fz... .} is independent means to show that each finite 
subset of it is independent. It will suffice to show that, for each n, the set 
{fo,...,fn} is independent. Suppose that 


ofo +--+ + enfin = 0. 
This says that 
Cot ar t e + cent” = 0 


for every x in F; in other words, every zx in F is a root of the polynomial 
f(x) =o + cix +--+ + cna”. We assume that the reader knows that a 
polynomial of degree n with complex coefficients cannot have more than n 
distinct roots. It follows that co = q = ++- = c&n = 0. 

We have exhibited an infinite basis for V. Does that mean that V is 
not finite-dimensional? As a matter of fact it does; however, that is not 
immediate from the definition, because for all we know V might also have 
a finite basis. That possibility is easily eliminated. (We shall eliminate it 
in general in the next theorem.) Suppose that we have a finite number of 
polynomial functions gı, . . . , g. There will be a largest power of x which 
appears (with non-zero coefficient) in gi(x),..., g(x). If that power is k, 
clearly f(x) = 2*+! is not in the linear span of m,...,9,. SO V is not 
finite-dimensional. 


A final remark about this example is in order. Infinite bases have 
nothing to do with ‘infinite linear combinations.’ The reader who feels an 
irresistible urge to inject power series 


oO 
D Ca¥ 
k=0 


into this example should study the example carefully again. If that does 
not effect a cure, he should consider restricting his attention to finite- 
dimensional spaces from now on. 
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Theorem 4. Let V be a vector space which is spanned by a finite set of 
vectors Bi, B2, . . - , Bm. Then any independent set of vectors in V is finite and 
contains no more than m elements. 


Proof. To prove the theorem it suffices to show that every subset 
S of V which contains more than m vectors is linearly dependent. Let S be 
such a set. In S there are distinct vectors a, a2,..., @, Where n > m. 
Since fi, . . . , Bm Span V, there exist scalars A,; in F such that 


m 
a= D> A iibi- 
i=l 
For any n scalars zı, £2 . . ., &n we have 


n 
idı + sees + Crk, = 2 UjQ; 
gl 


> Tj 2 Aip: 


j=1 i= 


ll 


ZS (Awa)bi 
j=l i=1 


m 


Zz ( È A its) 
j=l 


i=1 


Since n > m, Theorem 6 of Chapter 1 implies that there exist scalars 
Tis Lg). > >, Ta not all O such that 

E Aijz; = 0, lsism. 

j=1 
Hence ma + tea, + +++ + 220, = 0. This shows that S is a linearly 
dependent set. 


Corollary 1. 1f V is a finite-dimensional vector space, then any two 
bases of V have the same (finite) number of elements. 


Proof. Since V is finite-dimensional, it has a finite basis 


{B1, Ba, ae > Bn} 
By Theorem 4 every basis of V is finite and contains no more than m 
elements. Thus if {œ, a2,...,@,} is a basis, n < m. By the same argu- 


ment, m <n. Hencem=n. J 


This corollary allows us to define the dimension of a finite-dimensional 


‘vector space as the number of elements in a basis for V. We shall denote 


the dimension of a finite-dimensional space V by dim V. This allows us 
to reformulate Theorem 4 as follows. 


Corollary 2. Let V be a finite-dimenstonal vector space and let n = 
dim V. Then 
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(a) any subset of V which contains more than n vectors is linearly 
dependent; 
(b) no subset of V which contains fewer than n vectors can span V. 


EXAMPLE 17. If F is a field, the dimension of F” is n, because the 
standard basis for F” contains n vectors. The matrix space F”*" has 
dimension mn. That should be clear by analogy with the case of F”, be- 
cause the mn matrices which have a 1 in the 7, 7 place with zeros elsewhere 
form a basis for F”*", If A is an m X n matrix, then the solution space 
for A has dimension n — r, where r is the number of non-zero rows in a 
row-reduced echelon matrix which is row-equivalent to A. See Example 15. 

If V is any vector space over F, the zero subspace of V is spanned by 
the vector 0, but {0} is a linearly dependent set and not a basis. For this 
reason, we shall agree that the zero subspace has dimension 0. Alterna- 
tively, we could reach the same conclusion by arguing that the empty set 
is a basis for the zero subspace. The empty set spans {0}, because the 
intersection of all subspaces containing the empty set is {0}, and the 
empty set is linearly independent because it contains no vectors. 


Lemma. Let S be a linearly independent subset of a vector space V. 
Suppose B is a vector in V which is not in the subspace spanned by S. Then 
the set obtained by adjoining B to S is linearly independent. 


Proof. Suppose ai, ..., @m are distinct vectors in S and that 
Car + +++ + Cnam + be = 0. 


Then b = 0; for otherwise, 


p= (-S) a+ + (-$) a. 


and £ is in the subspace spanned by S. Thus cya, + +--+ + €mam = 0, and 
since S is a linearly independent set each c = 0. J 


Theorem 5. If W is a subspace of a finite-dimensional vector space V, 
every linearly independent subset of W is finite and is part of a (finite) basis 
for W. 


Proof. Suppose Sp is a linearly independent subset of W. If S is 
a \inearly independent subset of W containing So, then S is also a linearly 
independent subset of V; since V is finite-dimensional, S contains no more 
than dim V elements. 

We extend Sy to a basis for W, as follows. If So spans W, then So is a 
basis for W and we are done. If Sọ does not span W, we use the preceding 
lemma to find a vector 8; in W such that the set Sı = So U {bı} is inde- 
pendent. If Sı spans W, fine. If not, apply the lemma to obtain a vector 2 
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in W such that S = Sı U {62} is independent. If we continue in this way, 
then (in not more than dim V steps) we reach a set 

Sn = SoU {Bu . . <, Bm} 
which is a basis for W. f 


Corollary 1. If W is a proper subspace of a finite-dimensional vector 
space V, then W is finite-dimensional and dim W < dim V. 


Proof. We may suppose W contains a vector a # 0. By Theorem 
5 and its proof, there is a basis of W containing a which contains no more 
than dim V elements. Hence W is finite-dimensional, and dim W < dim V. 
Since W is a proper subspace, there is a vector 8 in V which is not in W. 
Adjoining 8 to any basis of W, we obtain a linearly independent subset 
of V. Thus dim W < dim V. J 


Corollary 2. In a finite-dimensional vector space V every non-empty 
linearly independent set of vectors is part of a basis. 


Corollary 3. Let A be ann X n matrix over a field F, and suppose the 
row vectors of A form a linearly independent set of vectors in F", Then A is 
invertible. 


Proof. Let a, a,..., @, be the row vectors of A, and suppose 
W is the subspace of F” spanned by an, az, . . . , dn. Since ay, a, ...5 An 
are linearly independent, the dimension of W is n. Corollary 1 now shows 
that W = F”. Hence there exist scalars B;; in F such that 


= È Bija; 1sicn 
j=l 
where {e1, €2,..., €n} is the standard basis of F”. Thus for the matrix B 
with entries B;; we have 
BA=I. § 

Theorem 6. If Wı and W: are finite-dimensional subspaces of a vector 

space V, then W, + W: ts finite-dimensional and 
dim Wi + dim W, = dim (Wi N W2) + dim (Wi + W2). 


Proof. By Theorem 5 and its corollaries, W, QÑ W: has a finite 
basis {a),..., œx} which is part of a basis 


foy,...,%, Bo... bm for W 
and part of a basis 

{ay,.--, O%, Yne- Yny for Wa 
The subspace W; + W, is spanned by the vectors 


Aly e e3 Œk; Bi, © ++) Bm Yu sss s Yn 
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and these vectors form an independent set. For suppose 
È riai + È ybi + Vey, = 0. 
Then 
— È zyr = È riai + È yb; 
which shows that 2 z,y, belongs to W,. As È z,y, also belongs to Wz it 
follows that 
2 ZrYýr = P2) CiQ; 


for certain scalars c,..., Cx Because the set 


fo)... Qis Yl; y Yn? 
is independent, each of the scalars z = 0. Thus 


2 Tiati + 2 Yb; = 0 
and since 
{an Ak Bis- «sp Bm? 


is also an independent set, each x; = 0 and each y; = 0. Thus, 


{ar -3 Ok, Bi -© -s Bm, Yi- Ynt 
is a basis for Wı + Wo. Finally 


dim W, + dim W, = (k + m) + (k + n) 
=k+(m+k+n) 
= dim (W: OÑ W2) + dim (W: + W3). ff 


Let us close this section with a remark about linear independence 
and dependence. We defined these concepts for sets of vectors. It is useful 
to have them defined for finite sequences (ordered n-tuples) of vectors: 
Qy... n We say that the vectors a, ..., an are linearly dependent 
if there exist scalars c1, . . . , Cn, not all 0, such that ciar + +++ + Chan = 0. 
This is all so natural that the reader may find that he has been using this 
terminology already. What is the difference between a finite sequence 
Q,...,@, and a set {a,...,a,}? There are two differences, identity 
and order. 

If we discuss the set {a,...,a,}, usually it is presumed that no 
two of the vectors a, ..., a, are identical. In a sequence a,..., an all 
the œs may be the same vector. If a; = a; for some 7 # j, then the se- 
quence a, . . ., a, is linearly dependent: 


a; + (-—la; = 0. 


Thus, if a,...,@, are linearly independent, they are distinct and we 
may talk about the set {aı, . . . , an} and know that it has n vectors in it. 
So, clearly, no confusion will arise in discussing bases and dimension. The 
dimension of a finite-dimensional space V is the largest n such that some 
n-tuple of vectors in V is linearly independent—and so on. The reader 
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who feels that this paragraph is much ado about nothing might ask him- 
self whether the vectors 

ay = (62, 1) 

a = (V110, 1) 
are linearly independent in R?, 

The elements of a sequence are enumerated in a specific order. A set 
is a collection of objects, with no specified arrangement or order. Of 
course, to describe the set we may list its members, and that requires 
choosing an order. But, the order is not part of the set. The sets {1, 2, 3, 4} 
and {4, 3, 2, 1} are identical, whereas 1, 2, 3, 4 is quite a different sequence 
from 4, 3, 2, 1. The order aspect of sequences has no bearing on ques- 
tions of independence, dependence, etc., because dependence (as defined) 
is not affected by the order. The sequence æn, . . ., a is dependent if and 
only if the sequence ai,...,a, is dependent. In the next section, order 
will be important. 


Exercises 


1. Prove that if two vectors are linearly dependent, one of them is a scalar 
multiple of the other. 


2. Are the vectors 


a= U; 1, 2, 4), a: = (2, —1, —5, 2) 
a3; = (1, —1, —4, 0), a, = (2, 1, 1, 6) 


linearly independent in R4? 
3. Find a basis for the subspace of Rt spanned by the four vectors of Exercise 2. 
4, Show that the vectors 
a =(1,0,-1), a2:=(1,2,1), a3 = (0, —3, 2) 


form a basis for R°’. Express each of the standard basis vectors as linear combina- 
tions of a, a, and a3. 


5. Find three vectors in R? which are linearly dependent, and are such that 
any two of them are linearly independent. 


6. Let V be the vector space of all 2 X 2 matrices over the field F. Prove that V 
has dimension 4 by exhibiting a basis for V which has four elements. 


7. Let V be the vector space of Exercise 6. Let W:1 be the set of matrices of the 


form 


and let W, be the set of matrices of the form 


Be, 
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(a) Prove that Wı and W2 are subspaces of V. 
(b) Find the dimensions of W,, W2, W; + Wz, and W1 ÑN Wa 


8. Again let V be the space of 2 X 2 matrices over F. Find a basis {Ai, A2, Aa, Aa} 
for V such that A? = A; for each j. 


9. Let V be a vector space over a subfield F of the complex numbers. Suppose 
a, B, and y are linearly independent vectors in V. Prove that (a + 8), (8 + Y), 
and (y + a) are linearly independent. 


10. Let V be a vector space over the field F. Suppose there are a finite number 
of vectors a, ...,@,in V which span V, Prove that V is finite-dimensional. 


11. Let V be the set of all 2 X 2 matrices A with complex entries which satisfy 
An + An = 0. 

(a) Show that V is a vector space over the field of real numbers, with the 
usual operations of matrix addition and multiplication of a matrix by a scalar. 

(b) Find a basis for this vector space. 

(c) Let W be the set of all matrices A in V such that An = —Ay (the bar 
denotes complex conjugation). Prove that W is a subspace of V and find a basis 
for W. 


12. Prove that the space of allm X n matrices over the field F has dimension mn, 
by exhibiting a basis for this space. 


13. Discuss Exercise 9, when V is a vector space over the field with two elements 
described in Exercise 5, Section 1.1. 


14, Let V be the set of real numbers. Regard V as a vector space over the field 
of rational numbers, with the usual operations. Prove that this vector space is not 
finite-dimensional. 


49 


2.4. Coordinates 


One of the useful features of a basis @ in an n-dimensional space V is 
that it essentially enables one to introduce coordinates in V analogous to 
the ‘natural coordinates’ zx; of a vector a = (zı, .. ., £n) in the space F”. 
In this scheme, the coordinates of a vector a in V relative to the basis ® 
will be the scalars which serve to express a@ as a linear combination of the 
vectors in the basis. Thus, we should like to regard the natural coordinates 
of a vector a in F” as being defined by @ and the standard basis for F”; 
however, in adopting this point of view we must exercise a certain amount 
of care. If 

a= (£u. . . Xn) = È xy; 


and & is the standard basis for F”, just how are the coordinates of a deter- 
mined by ® and a? One way to phrase the answer is this. A given vector a 
has a unique expression as a linear combination of the standard basis 
vectors, and the 7th coordinate x; of a is the coefficient of e; in this expres- 
sion. From this point of view we are able to say which is the ith coordinate 
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because we have a ‘natural’ ordering of the vectors in the standard basis, 
that is, we have a rule for determining which is the ‘first’ vector in the 
basis, which is the ‘second,’ and so on. If @ is an arbitrary basis of the 
n-dimensional space V, we shall probably have no natural ordering of the 
vectors in ®, and it will therefore be necessary for us to impose some 
order on these vectors before we can define ‘the 7th coordinate of a rela- 
tive to @. To put it another way, coordinates will be defined relative to 
sequences of vectors rather than sets of vectors. 


Definition. If V is a finite-dimensional vector space, an ordered basis 
for V is a finite sequence of vectors which is linearly independent and spans V. 


If the sequence a;,...,a, is an ordered basis for V, then the set 
{o1,...,@,} is a basis for V. The ordered basis is the set, together with 
the specified ordering. We shall engage in a slight abuse of notation and 
describe all that by saying that 


G = {a,..., an} 


is an ordered basis for V. 
Now suppose V is a finite-dimensional vector space over the field F 
and that 
G = {a,..., an} 
is an ordered basis for V. Given a in V, there is a unique n-tuple 
(£i, < > -3 £n) of scalars such that 


n 
a= D tii 
1 


i= 


The n-tuple is unique, because if we also have 


n 
a= È za; 
i=l 
then 
n 
È (xi — zija; = 0 
i=1 
and the linear independence of the a; tells us that x; — z; = 0 for each 7. 
We shall call z; the ith coordinate of a relative to the ordered basis 


B= {a1 ..., an}. 
If 


B= 2 Yidi 
then 
a+ß= 2 (xi + yi)ai 


so that the ith coordinate of (a + @) in this ordered basis is (xi + y:). 
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Similarly, the ith coordinate of (cæ) is cx; One should also note that every 
n-tuple (tı .. ., £n) in F” is the n-tuple of coordinates of some vector in 
V, namely the vector 


n 
È Tidi 
i=l 

To summarize, each ordered basis for V determines a one-one 
correspondence 


a (ti, . . - , En) 


between the set of all vectors in V and the set of all n-tuples in F”. This 
correspondence has the property that the correspondent of (a + £) is the 
sum in F” of the correspondents of a and £, and that the correspondent 
of (ca) is the product in F” of the scalar c and the correspondent of a. 

One might wonder at this point why we do not simply select some 
ordered basis for V and describe each vector in V by its corresponding 
n-tuple of coordinates, since we would then have the convenience of oper- 
ating only with n-tuples. This would defeat our purpose, for two reasons. 
First, as our axiomatic definition of vector space indicates, we are attempt- 
ing to learn to reason with vector spaces as abstract algebraic systems. 
Second, even in those situations in which we use coordinates, the signifi- 
cant results follow from our ability to change the coordinate system, i.e., 
to change the ordered basis. 

Frequently, it will be more convenient for us to use the coordinate 
matrix of a relative to the ordered basis 8: 


vy 
X=] : 
Tr 
rather than the n-tuple (xı, . . ., £n) of coordinates. To indicate the de- 
pendence of this coordinate matrix on the basis, we shall use the symbol 
[ele 


for the coordinate matrix of the vector «æ relative to the ordered basis @. 
This notation will be particularly useful as we now proceed to describe 
what happens to the coordinates of a vector a as we change from one 
ordered basis to another. 

Suppose then that V is n-dimensional and that 


G = fay... an} and @ = {ai,..., an} 
are two ordered bases for V. There are unique scalars P;; such that 
(2-13) a= D Pija; l<j<n. 
i=l 


Let zi,..., £n be the coordinates of a given vector a in the ordered basis 
@’. Then 
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a= tiai + --- + tian 
= D rjo 
j= 
n ee 
= 2 xy D P50: 
j=l i=l 


2 (Pijzi)ai 


j=1 i= 
= 2 (3 Pat) ay. 
t=1 \=l 
Thus we obtain the relation 
(2-14) a= (3 Psszt) Qi. 
i=1 \js1 
Since the coordinates 2, 2, . . . , £n Of a in the ordered basis @ are uniquely 


determined, it follows from (2-14) that 
(2-15) m= $ Py 1S<i<n 
j=1 


Let P be the n X n matrix whose i, j entry is the scalar P,;, and let X and 
X’ be the coordinate matrices of the vector a in the ordered bases ® and 
@’. Then we may reformulate (2-15) as 


(2-16) X = PX’. 


Since @ and @’ are linearly independent sets, X = 0 if andonly if X’ = 0. 
Thus from (2-16) and Theorem 7 of Chapter 1, it followsthat P is invertible. 
Hence 


(2-17) X’ = PX, 


If we use the notation introduced above for the coordinate matrix of a 
vector relative to an ordered basis, then (2-16) and (2-17) say 


[ale = Plale 
[ale = P— Lala. 


Thus the preceding discussion may be summarized as follows. 
Theorem 7. Let V be an n-dimensional vector space over the field F, 


and let @ and @’ be two ordered bases of V. Then there is a unique, necessarily 
invertible, n X n matrix P with entries in F such that 


(i) [ela = Plale 
(ii) [ale = P~ [ala 


for every vector ain V. The columns of P are given by 


P; = [ajle, j=1,...,n. 
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To complete the above analysis we shall also prove the following 
result. 


Theorem 8. Suppose P is an n X n invertible matrix over F. Let V 
be an n-dimensional vector space over F, and let ® be an ordered basis of V. 
Then there is a unique ordered basis @' of V such that 


(i) [ela = Pial 
(ii) [ale = P~'[ale 
for every vector ain V. 
Proof. Let @ consist of the vectors a,...,@,.. If @! = 
{aj,..., æn} is an ordered basis of V for which (i) is valid, it is clear that 


n 

, 

aj = È Pija: 
i=l 


Thus we need only show that the vectors aj, defined by these equations, 
form a basis. Let Q = P-!. Then 


2 Qira; = 2 Qik 2 Pijai 
j j i 
= D E PQr as 
J 4 
PQr) ai 
2 (= R) a 


ll 


= (lk: 
Thus the subspace spanned by the set 
@’ = {aj,..., h 


contains ® and hence equals V. Thus @’ is a basis, and from its definition 
and Theorem 7, it is clear that (i) is valid and hence also (ii). J 


EXAMPLE 18. Let F be a field and let 


a = (tiyip. +, 2n) 
be a vector in F”. If G is the standard ordered basis of F”, 
R = fen... y en} 
the coordinate matrix of the vector a in the basis @ is given by 
Tı 
[als =|“? 
Tn 


EXAMPLE 19. Let R be the field of the real numbers and let 6 be a 
fixed real number. The matrix 


cos@ —sin 0 
n P @ cos 4 
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is invertible with inverse, 
Ske cos 6 sin 6 
£ E 6 cosé 


Thus for each 6 the set @’ consisting of the vectors (cos 9, sin 8), (—sin 6, 
cos 0) is a basis for 2; intuitively this basis may be described as the one 
obtained by rotating the standard basis through the angle 0. If œ is the 


vector (zı, 22), then 
igs [ cos 6 sin 4 [>] 
p —sin ð cos 6J22 


Xi zı cos 6 + qxz sin 0 
xs = —2, Sin 6 + x cos ð. 


or 


li 


ExamPrLE 20. Let F be a subfield of the complex numbers. The matrix 


—-1 4 5 
P= 02 —3 
0 0 8 
is invertible with inverse 
-1 2 4 
Pos 0 4 ét 
00 ¢ 


Thus the vectors 


forma basis @&’ of F?. The coordinates x}, x2, z4 of the vector a = (21, £2, £3) 
in the basis @’ are given by 


zi —2, + 2x2 + 4T -1 


2 “g Tı 
1 
z2 | = gre + TETs =| 04 |e 
z$ $23 0 0 $ T3 


In particular, 
(3, 2, —8) = —10ai — 4a3 — a3. 


Exercises 


1. Show that the vectors 
a= (1, 1, 0, 0), a = (0, 0, 1,1) 
a3 = (i, 0,0, 4), Q4 = (0,0,0, 2) 


form a basis for R4. Find the coordinates of each of the standard basis vectors 
in the ordered basis {a1, a2, aa, a4}. 
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2. Find the coordinate matrix of the vector (1, 0, 1) in the basis of C? consisting 
of the vectors (2i, 1, 0), (2, —1, 1), (0, 1 + i, 1 — i), in that order. 
3. Let G = {a1, az, æ} be the ordered basis for R? consisting of 
a, = (1,0, —1),, ae:=(1,1,1), as = (1,0,0). 
What are the coordinates of the vector (a, b, c) in the ordered basis @? 


4, Let W be the subspace of C? spanned by a, = (1, 0,7) and a, = (1 +i, 1, — 1). 
(a) Show that a; and a, form a basis for W. 
(b) Show that the vectors 8; = (1, 1,0) and ß: = (1,%,1 + îi) are in W and 
form another basis for W. 
(c) What are the coordinates of a; and ay in the ordered basis {8,82} for W? 


5. Leta = (21, 2) and B = (Yı, Y2) be vectors in R? such that 
LYi + LY2 = 0, ata = yi t y= 1. 


Prove that @ = {a, B} is a basis for R?. Find the coordinates of the vector (a, b) 
in the ordered basis ® = {a, 8}. (The conditions on a and 8 say, geometrically, 
that a and 8 are perpendicular and each has length 1.) 


6. Let V be the vector space over the complex numbers of all functions from R 
into C, i.e., the space of all complex-valued functions on the real line. Let fi(z) = 1, 
fox) = e, f(x) = e. 

(a) Prove that fi, f2, and f are linearly independent. 
(b) Let g(x) = 1, g2(x) = cos x, g3(z) = sin x. Find aninvertible3 X 3 matrix 
P such that 


gi = 2 Piifi. 

7. Let V be the (real) vector space of all polynomial functions from R into R 
of degree 2 or less, i.e., the space of all functions f of the form 

f(t) = co + cit + con. 
Let ¢ be a fixed real number and define 

g(t) =1, g(t) =at+t, g(t) = (x+ t). 

Prove that ® = {91, g2, g3} is a basis for V. If 

f(z) = co + cra + con? 


what are the coordinates of f in this ordered basis @? 
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In this section we shall utilize some elementary facts on bases and 
dimension in finite-dimensional vector spaces to complete our discussion 
of row-equivalence of matrices. We recall that if A is an m X n matrix 
over the field F the row vectors of A are the vectors a,...,Qm in F” 
defined by 

a= (Aa, sey Ain) 
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and that the row space of A is the subspace of F” spanned by these vectors. 
The row rank of A is the dimension of the row space of A. 

If P isa k X m matrix over F, then the product B = PA isak Xn 
matrix whose row vectors (;,..., x are linear combinations 


B: = Paa + +- + Pimotm 


of the row vectors of A. Thus the row space of B is a subspace of the row 
space of A. If P is an m X m invertible matrix, then B is row-equivalent 
to A so that the symmetry of row-equivalence, or the equation A = PB, 
implies that the row space of A is also a subspace of the row space of B. 


Theorem 9. Row-equivalent matrices have the same row space. 


Thus we see that to study the row space of A we may as well study 
the row space of a row-reduced echelon matrix which is row-equivalent 
to A. This we proceed to do. 


Theorem 10. Let R be a non-zero row-reduced echelon matrix. Then 
the non-zero row vectors of R form a basis for the row space of R. 
Proof. Let p,,..., p, be the non-zero row vectors of R: 
' pi = (Ra,..., Rin). 


Certainly these vectors span the row space of R; we need only prove they 
are linearly independent. Since R is a row-reduced echelon matrix, there 


are positive integers kı, . . . , k, such that, for i < r 
(a) RG,7) =0 if j< ki 
(2-18) (b) RG, kj) = ôi; 
(ec) ki < +++ < kn 
Suppose 8 = (bı, . . . , bn) is a vector in the row space of R: 
(2-19) B = ap + +++ + Crp- 
Then we claim that c; = ba, For, by (2-18) 
(2-20) bu = 5 ciR(, k) 
i=1 
= 5 C484; 
i=l 
= Cj 
In particular, if 8 = 0, i.e., if cp: + --- + crpr = 0, then c; must be the 
k;th coordinate of the zero vector so that c; = 0, 7 = 1,...,7r. Thus 
Pi» » - . , Pr are linearly independent. 


Theorem 11. Let m and n be positive integers and let F be a field. 
Suppose W is a subspace of F” and dim W < m. Then there is precisely one 
m X n row-reduced echelon matrix over F which has W as its row space. 
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Proof. There is at least one m X n row-reduced echelon matrix 
with row space W. Since dim W < m, we can select some m vectors 
a1,...,@m_, in W which span W. Let A be the m X n matrix with row 
vectors a,,..., m and let R be a row-reduced echelon matrix which is 
row-equivalent to A. Then the row space of R is W. 

Now let R be any row-reduced echelon matrix which has W as its row 
space. Let pi,..., pr be the non-zero row vectors of R and suppose that 
the leading non-zero entry of p; occurs in column k; 7 = 1,...,7r. The 
vectors p1,..-, pr form a basis for W. In the proof of Theorem 10, we 
observed that if 8 = (b,...,6,) isin W, then 


B=Cp, t:e + Crp, 


and c; = br; in other words, the unique expression for £ as a linear com- 
bination of p1,..., pris 


(2-21) B= $ brp: 
4=1 


Thus any vector ĝis determined :f one knows the coordinates bk, i = 1,..., 
r. For example, p, is the unique vector in W which has k,th coordinate 1 
and k;th coordinate 0 for 7 = s. 

Suppose £ is in W and 8 = 0. We claim the first non-zero coordinate 
of @ occurs in onc of the columns ks. Since 


B= 2 brip: 
and 6 =Æ 0, we can write 


(2-22) B= E bap, by 0. 


From the conditions (2-18) one has R,; = Oif 7 > sandj < k.. Thus 
B= (0,...,0, Dis +. On), by, # 0 


and the first non-zero coordinate of 8 occurs in column k,. Note also that 
for each ks, s = 1,..., 7, there exists a vector in W which has a non-zero 
k,th coordinate, namely ps- 

It is now clear that R is uniquely determined by W. The description 
of R in terms of W is as follows. We consider all vectors 8 = (b,..., bn) 
in W. If 8 = 0, then the first non-zero coordinate of 8 must occur in some 
column ¢: 


B= (0,...,0, b:..., bn), b; #0. 
Let ki,..., k, be those positive integers ¢ such that there is some 6 # 0 
in W, the first non-zero coordinate of which occurs in column ¢. Arrange 
ky,...,k, in the order kı < kz < ++- < kn For each of the positive 


integers k, there will be one and only one vector p, in W such that the 
k,th coordinate of p, is 1 and the k,th coordinate of p, is O for i = s. Then 
R is the m X n matrix which has row vectors p;,...,pry0,...,0. T 
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Corollary. Hach m X n matrix A is row-equinalent to one and only 
one row-reduced echelon matrix. 


Proof. We know that A is row-equivalent to at least one row- 
reduced echelon matrix R. If A is row-equivalent to another such matrix 
R’, then R is row-equivalent to R’; hence, R and R’ have the same row 
space and must be identical. J 


Corollary. Let A and B be m X n matrices over the field F. Then A 
and B are row-equivalent if and only if they have the same row space. 


Proof. We know that if A and B are row-equivalent, then they 
have the same row space. So suppose that A and B have the same row 
space. Now A is row-equivalent to a row-reduced echelon matrix R and 
B is row-equivalent to a row-reduced echelon matrix R’. Since A and B 
have the same row space, R and R’ have the same row space. Thus R = R’ 
and A is row-equivalent to B. jj 


To summarize—if A and B are m X n matrices over the field F, the 
following statements are equivalent: 


1. A and B are row-equivalent. 
2. A and B have the same row space. 
3. B = PA, where P is an invertible m X m matrix. 


A fourth equivalent statement is that the homogeneous systems 
AX =0 and BX =0 have the same solutions; however, although we 
know that the row-equivalence of A and B implies that these systems 
have the same solutions, it seems best to leave the proof of the converse 
until later. 


2.6. Computations Concerning Subspaces 


We should like now to show how elementary row operations provide 
a standardized method of answering certain concrete questions concerning 
subspaces of F”. We have already derived the facts we shall need. They 
are gathered here for the convenience of the reader. The discussion applies 
to any n-dimensional vector space over the field F, if one selects a fixed 


ordered basis ® and describes each vector a in V by the n-tuple (£1, . . . , £n) 
which gives the coordinates of a in the ordered basis &. 
Suppose we are given m vectors aj,...,Qm in F”. We consider the 


following questions. 


1. How does one determine if the vectors a,,...,@m are linearly 
independent? More generally, how does one find the dimension of the 
subspace W spanned by these vectors? 
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2. Given 6 in F”, how does one determine whether £ is a linear com- 
bination of a,..., @m, 1.e., whether £ is in the subspace W? 
3. How can one give an explicit description of the subspace W? 


The third question is a little vague, since it does not specify what is 
meant by an ‘explicit description’; however, we shall clear up this point 
by giving the sort of description we have in mind. With this description, 
questions (1) and (2) can be answered immediately. 

Let A be the m X n matrix with row vectors a;: 


a; = (Aa... Ain). 
Perform a sequence of elementary row operations, starting with A and 
terminating with a row-reduced echelon matrix R. We have previously 


described how to do this. At this point, the dimension of W (the row space 
of A) is apparent, since this dimension is simply the number of non-zero 


row vectors of R. If p;,..., p- are the non-zero row vectors of R, then 
G = {pı . . . , pr} is a basis for W. If the first non-zero coordinate of p; is 
the k;th one, then we have fori < r 

(a) RG, j) =0, if j< ki 

(b) R(t, kj) = 64; 

(c) ky < e < krn 


The subspace W consists of all vectors 
B _ ĉip + ited + Crpy 


r 
= > c:(Ra, PET Rin). 
i=l 
The coordinates b,, . . . ,b„ of such a vector 8 are then 
(2-23) b= 5 cRy 
i=l 
In particular, ba; = c;, and so if 6 = (bi, .. |, ba) is a linear combination 


of the p, it must be the particular linear combination 


i 
(2-24) p= 2 bripi. 
i= 

The conditions on £ that (2-24) should hold are 
(2-25) b= E bey j=l. 

i=1 
Now (2-25) is the explicit description of the subspace W spanned by 
Q1,...) Qm that is, the subspace consists of all vectors 8 in F” whose co- 
ordinates satisfy (2-25). What kind of description is (2-25)? In the first 
place it describes W as all solutions 8 = (b,..., bn) of the system of 


homogeneous linear equations (2-25). This system of equations is of a 
very special nature, because it expresses (n — r) of the coordinates as 
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linear combinations of the r distinguished coordinates bk... br. One 
has complete freedom of choice in the coordinates b;,, that is, if ¢1,..., Cr 


are any r scalars, there is one and only one vector 8 in W which has c; as 
its k;th coordinate. 

The significant point here is this: Given the vectors a,, row-reduction 
is a straightforward method of determining the integers r, k,,...,k, and 
the scalars R;; which give the description (2-25) of the subspace spanned 
by ai,..., Qm One should observe as we did in Theorem 11 that every 
subspace W of F” has a description of the type (2-25). We should also 
point out some things about question (2). We have already stated how 
one can find an invertible m X m matrix P such that R = PA, in Section 
1.4. The knowledge of P enables one to find the scalars %,..., &m Such 
that 

B = tia + +++ + Imm 


when this is possible. For the row vectors of R are given by 
m 
pi = È Piaj 
j=l 
so that if 8 is a linear combination of the a;, we have 


B > z bkipi 
i=1 


ll 
Ma 


m 
by, È Pisa; 
J=1 


= E E bPa 


j=1i=1 
and thus ti = E be Pay 
i=1 


is one possible choice for the z; (there may be many). 

The question of whether 8 = (bı, .. ., bn) is a linear combination of 
the a;, and if so, what the scalars x; are, can also be looked at by asking 
whether the system of equations 


m 
2 Auti = b; Ts lece 
ia 


has a solution and what the solutions are. The coefficient matrix of this 
system of equations is then X m matrix B with column vectors a, . . . , am 
In Chapter 1 we discussed the use of elementary row operations in solving 
a system of equations BX = Y. Let us consider one example in which we 
adopt both points of view in answering questions about subspaces of F”. 


EXAMPLE 21. Let us pose the following problem. Let W be the sub- 
space of Rt spanned by the vectors 
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a= (1, 2, 2, 1) 
Q = (0, 2; 0, 1) 
a; = (—2, 0, —4, 3). 
(a) Prove that a, a2, œ form a basis for W, i.e., that these vectors 


are linearly independent. 

(b) Let B = (bı, be, bs, bs) be a vector in W. What are the coordinates 
of 6 relative to the ordered basis {a1, a2, a3}? 

(c) Let 


Show that aj, a3, a3 form a basis for W. 

(d) If 8 is in W, let X denote the coordinate matrix of 8 relative to 
the a-basis and X’ the coordinate matrix of 8 relative to the a’-basis. Find 
the 3 X 3 matrix P such that X = PX’ for every such £. 

To answer these questions by the first method we form the matrix A 
with row vectors ai, a, œs, find the row-reduced echelon matrix R which 
is row-equivalent to A and simultaneously perform the same operations 
on the identity to obtain the invertible matrix Q such that R = QA: 


1 2 2 1 10 2 0 
0 2 0 1j/>R=]0 1 0 0 
—2 0 —4 3 0001 


10 0 6 -6 0 
010/>Q@=3}-2 5 -1 
001 4 —4 2 


(a) Clearly R has rank 3, so aı, az and a3 are independent. 

(b) Which vectors 8 = (bı, bz, bs, bs) are in W? We have the basis 
for W given by pı, p2, ps, the row vectors of R. One can see at a glance that 
the span of pı, p2, ps consists of the vectors £ for which b; = 2b;. For such 
a B we have 

B = bip: + bep2 + baps 
F (bi, be, bal R 
= [bi be bslQA 
= Ta, + T202 + T3Q3 


where z; = [b; bè b4JQi: 


y= ob -— ge + 2ba 
(2-26) Toa —b + ğb: — 2ba 
— 4b: + dbs. 


I 


& 
w 
l 
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(c) The vectors ai, œz, œs are all of the form (y1, Yo; Ys, Ys) With Y; = 2yı 
and thus they are in W. One can see at a glance that they are independent. 


(d) The matrix P has for its columns 
P; = [ajle 


where ® = {a1, as a3}. The equations (2-26) tell us how to find the co- 
ordinate matrices for aj, a3, a3. For example with B = aj we have b; = 1, 
b: = 0, bs = 2, b4 = 0, and 

t = — $00) 40) =. 1 

m= —1 + §(0) — 40) = -1 

T3 — 0) +30)= 0. 


Thus ai = œ — az. Similarly we obtain a} = a2 and a3 = 2a, — 2a: + az. 
Hence 


ll 


Now let us see how we would answer the questions by the second 
method which we described. We form the 4 X 3 matrix B with column 
vectors a, G2, a3! 


10 —-2 
2 2 0 
Pr 2 0 —4 
1 1 3 


We inquire for which y, Y2, ys, ys the system BX = Y has a solution. 


1 0 —2 y 1 0 -2 Yı 

2? 2 0 Y2 = 0 2 4 Ya 2y1 AS, 

20 -4 ys 0 0 0 yy — 2y 

1 1 3 Ya 0 1 5 Ya — Yı 
1 0 -2 Yı 1 0 0 pwr iy tiy 
0 1 5 Yan 010 —y + $y: — ĝu 
0 0 0 ys — 2y 0 0 0 Ys — 2yı 


Thus the condition that the system BX = Y have a solution is y, = 2y. 
So 8 = (bi, be, bs, ba) is in W if and only if bs = 2b; If B is in W, then the 
coordinates (21, £2, x3) in the ordered basis {a1, œz, az} can be read off from 
the last matrix above. We obtain once again the formulas (2-26) for those 
coordinates. 

The questions (c) and (d) are now answered as before. 
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EXAMPLE 22. We consider the 5 X 5 matrix 


1 2 0 3 0 
1 2 -1 —l 0 
A=ļ|0 0 1 4 0 
2 4 1 10 1 
0 0 0 01 


and the following problems concerning A 


(a) Find an invertible matrix P such that PA is a row-reduced 
echelon matrix R. 

(b) Find a basis for the row space W of A. 

(c) Say which vectors (bi, bz, bs, bs, bs) are in W. 

(d) Find the coordinate matrix of each vector (bi, be, bs, ba, bs) in W 
in the ordered basis chosen in (b). 

(e) Write each vector (bı, bz, bs, ba, bs) in W as a linear combination 
of the rows of A. 

(f) Give an explicit description of the vector space V of all 5 X 1 
column matrices X such that AX = 0. 

(g) Find a basis for V. 

(h) For what 5 X 1 column matrices Y does the equation AX = Y 
have solutions X? 


To solve these problems we form the augmented matrix A’ of the 
system AX = Y and apply an appropriate sequence of row operations 
to A’. 


1 2 0 3 0 y 1 2 0 3 0 Yı 

12 —1 ~1 0 x 00 —-1 —4 0 -y +y 

0 0 1 4 0 y|—|0 0 1 4 0 Ys = 

2 4 1 10 1 y% 0 0 1 4 1 ~—2y, + ys 

0 0 0 OL Y¥5 0 0 0 0 1 Ys 

120 3 0 Yı 

60 11 4 0 Yı — Y2 

0000 0 =y ty +y |— 

0 0 0 0 1 -—8y1 +Y: + y 
0 0 1 4 0 Yı = Y 
00000 =y HY + Ys 
@ 0 0 0 0 3y +Y: + Yy — y 
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(a) If 
Yı 
Yo Y 
PY = Yi 
—y + Y +Y 


—3y1 + Y2 + Ys — Ys 
for all Y, then 


1 00 0 0 
1 -1 0 0 0 
P= 0 00 0 1 
—1 1 1 0 0 
—3 1 0 1 —1 
hence PA is the row-reduced echelon matrix 
120 3 0 
001 4 0 
R=]0 0001 
00000 
000 0 0 


It should be stressed that the matrix P is not unique. There are, in fact, 
many invertible matrices P (which arise from different choices for the 
operations used to reduce A’) such that PA = R. 

(b) As a basis for W we may take the non-zero rows 


ma=(1 2 0 3 0) 
p= (0 0 1 4 0) 
p=(O0 00 0 1) 
of R. 
(c) The row-space W consists of all vectors of the form 


B = àp + Cp + Cops 
= (C1, 2c, C2; 3c + 4c, C3) 
where c, Cz, c are arbitrary scalars. Thus (b, be, b3, ba, bs) is in W if and 
only if 
(bi, be, bs, ba, bs) = bipi + bsp + bsps 
which is true if and only if 


be = 2b; 

b4 = 3bı + 4bs. 
These equations are instances of the general system (2-25), and using 
them we may tell at a glance whether a given vector lies in W. Thus 
(—5, —10, 1, —11, 20) is a linear combination of the rows of A, but 
(1, 2, 3, 4, 5) is not. 

(d) The coordinate matrix of the vector (bı, 2b, bs, 3b1 + 4bs, bs) in 

the basis {1, p2, p} is evidently 
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by 
bs | 
bs 


(e) There are many ways to write the vectors in W as linear combi- 
nations of the rows of A. Perhaps the easiest method is to follow the first 
procedure indicated before Example 21: 


B = (bı, 2b, bs, 8bı + 4bs, bs) 
lbi, bs, bs, 0, 0] - R 
[b,, bs, bs, 0, 0] - PA 


1 0 0 0 0 
1 -1 0 0 0 
= [by, bs, bs, 0, 0] 0 0 0 0 1]: A 
~1 1 1 0 0 
—3 1 0 1 -1 


i 


[bi + bs, —bs, 0, 0, bs] - A. 
In particular, with 8 = (—5, —10, 1, —11, 20) we have 


1 2 0 3 0 
12 -1 -1 0 
6B = (—4, —1,0, 0, 20)}0 0 1 4 0}: 
2 4 1 10 1 
0 0 0 0 1 


(f) The equations in the system RX = 0 are 


zı + 2x2 + 3x4 = 0 
T3 + 4x4 = 0 
ts = 0. 
Thus V consists of all columns of the form 
— 2t — 3a 
T2 
X= — 4r 
Ta 


0 


where a2 and a are arbitrary. 
(g) The columns 


0 0 


form a basis of V. This is an example of the basis described in Example 15. 
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(h) The equation AX = Y has solutions X if and only if 


Yi + Y2 + Ys =0 
—3y1 + Y2 + ys — Ys = 0. 


Exercises 


l. Let s < nand A an s X n matrix with entries in the field F. Use Theorem 4 
(not its proof) to show that there is a non-zero X in F**! such that AX = 0. 


2. Let 

a= (1, 1, -2, 1), ag (3, 0,4, —1), as = (- 1, 2, 5, 2). 
Let 

a= (4, —5, 9, -7), B= (3, 1, —4, 4), Y= (-1, 1, 0, 1). 


(a) Which of the vectors a, 8, y are in the subspace of R+ spanned by the a;? 
(b) Which of the vectors a, 8, y are in the subspace of C4 spanned by the a? 
(c) Does this suggest a theorem? 


3. Consider the vectors in R4 defined by 
a, = (—1, 0,1, 2), a: = (3, 4, —2, 5), a; = (1, 4, 0, 9). 
Find a system of homogeneous linear equations for which the space of solutions 
is exactly the subspace of R4 spanned by the three given vectors. 
4. In C, let 
a, = (1,0, =i), a =(1+%,1-—%41), a= (iii). 
Prove that these vectors form a basis for C’. What are the coordinates of the 
vector (a, b, c) in this basis? 
5. Give an explicit description of the type (2-25) for the vectors 
B= (bi, bz, bs, ba, bs) 
in Ré which are linear combinations of the vectors 
a, = (1,0, 2,1, —1), æ: = (—1, 2, —4, 2, 0) 
a; = (2, —1, 5,2, 1), a = (2, 1, 3, 5, 2). 
6. Let V be the real vector space spanned by the rows of the matrix 
3 21 0 9 0 


1 
214 0 6 1i 
6 42 —1 13 0. 

(a) Find a basis for V. 

(b) Tell which vectors (£1, £2, Ts, £4, zs) are elements of V. 

(c) If (£1, £z £3, £4, £5) is in V what are its coordinates in the basis chosen in 
part (a)? 

7. Let A be an m X n matrix over the field F, and consider the system of equa- 

tions AX = Y. Prove that this system of equations has a solution if and only if 
the row rank of A is equal to the row rank of the augmented matrix of the system. 


3. Linear Transformations 


3.1. Linear Transformations 


We shall now introduce linear transformations, the objects which we 
shall study in most of the remainder of this book. The reader may find it 
helpful to read (or reread) the discussion of functions in the Appendix, 
since we shall freely use the terminology of that discussion. 


Definition. Let V and W be vector spaces over the field F. A linear 
transformation from V into W is a function T from V into W such that 


T(ca + 8) = e(Ta) + TB 


for all œ and B in V and all scalars c in F. 


EXAMPLE 1. If V is any vector space, the identity transformation 
I, defined by Ja = a, is a linear transformation from V into V. The 
zero transformation 0, defined by Oa = 0, is a linear transformation 
from V into V. 


EXAMPLE 2. Let F be a field and let V be the space of polynomial 
functions f from F into F, given by 


f(z) = o + ar + e + ect. 


(Df) (x) = a + 2er + +++ + koat, 


Then D is a linear transformation from V into V—the differentiation 
transformation. 
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EXAMPLE 3, Let A be a fixed m X n matrix with entries in the field F. 
The function T defined by T(X) = AX is a linear transformation from 
Fe into F”x1. The function U defined by U(a) = aA is a linear trans- 
formation from F into F”. 


EXAMPLE 4. Let P be a fixed m X m matrix with entries in the field F 
and let Q be a fixed n X n matrix over F. Define a function T from the 
space Fx into itself by T(A) = PAQ. Then T is a linear transformation 
from F™™* into F™*, because 


T(cA + B) = P(cA + B)Q 

(cPA + PB)Q 
cPAQ + PBQ 
cT(A) + T(B). 


1i 


It 


i 


Examp.e 5. Let R be the field of real numbers and let V be the space 
of all functions from R into R which are continuous. Define T by 


(INE) = [FO ae 


Then T is a linear transformation from V into V. The function Tf is 
not only continuous but has a continuous first derivative. The linearity 
of integration is one of its fundamental properties. 


The reader should have no difficulty in verifying that the transfor- 
mations defined in Examples 1, 2, 3, and 5 are linear transformations. We 
shall expand our list of examples considerably as we learn more about 
linear transformations. 

It is important to note that if T is a linear transformation from V 
into W, then T(0) = 0; one can see this from the definition because 


TO) = TO + 0) = TO) + TO). 


This point is often confusing to the person who is studying linear algebra 
for the first time, since he probably has been exposed to a slightly different 
use of the term ‘linear function.’ A brief comment should clear up the 
confusion. Suppose V is the vector space ?'. A linear transformation from 
V into V is then a particular type of real-valued function on the real line R. 
In a calculus course, one would probably call such a function linear if its 
graph is a straight line. A linear transformation from FR! into FR}, according 
to our definition, will be a function from ÈR into R, the graph of which is a 
straight line passing through the origin. 

In addition to the property T(0) = 0, let us point out another property 
of the general linear transformation T. Such a transformation ‘preserves’ 
linear combinations; that is, if a, ..., a, are vectors in V and c,..., Cn 
are scalars, then 


T (cay + +++ + Cran) = e1(Ton) + +++ + en(Tan). 
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This follows readily from the definition. For example, 


T (cya, + C202) = c1(Ta1) + T (coe) 
= cl Ta) + ca(T a). 


Theorem 1. Let V be a finite-dimensional vector space over the field F 
and let {o1,..., an} be an ordered basis for V. Let W be a vector space over the 
same field F and let Bı, . . . , Bn be any vectors in W. Then there is precisely 
one linear transformation T from V into W such that 


Ta; = Bi, sles E 


Proof. To prove there is some linear transformation T with Ta; = 
B; we proceed as follows. Given a in V, there is a unique n-tuple (£1, . . . , £n) 
such that 
a = Yay + ae + LnQn. 


For this vector a we define 
Ta = m6, + -:-- + LnBn. 


Then T is a well-defined rule for associating with each vector a in V a 
vector Ta in W. From the definition it is clear that Ta; = 8; for each 7. 
To see that T is linear, let 


B = yar + ++ + Yna 
be in V and let c be any scalar. Now 
ca + B = (cx, + Yijar + +++ + (Clin + Yn)an 
and so by definition 
T(ca + B) = (em + yi)Bi + +++ H+ (etn + Yn) Bn 
On the other hand, 


c(Ta) + TB 


c È tibi + = YiBi 


= (cx; + Yi)Bi 


and thus 
T(ca + B) = c(Ta) + TB. 


If U is a linear transformation from V into W with Ua; = bB; J = 


1,...,m, then for the vector a = £ Zia; we have 


Ua = u(3 ssa) 


i 
M 
R 
2 
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so that U is exactly the rule T which we defined above. This shows that the 
linear transformation T' with Ta; = 6; is unique. J 


Theorem 1 is quite elementary; however, it is so basic that we have 
stated it formally. The concept of function is very general. If V and W are 
(non-zero) vector spaces, there is a multitude of functions from V into W. 
Theorem 1 helps to underscore the fact that the functions which are linear 
are extremely special. 


EXAMPLE 6. The vectors 

a= G, 2) 

a = (3, 4) 
are linearly independent and therefore form a basis for R?. According to 
Theorem 1, there is a unique linear transformation from R? into R? such 
that 

To = (8, 2, 1) 

Tae = (6, 5, 4). 
If so, we must be able to find T(«). We find scalars cı, cz such that «& = 
Cia, + Coa. and then we know that Te = ciT'œ + Ta. If (1,0) = 
c(l, 2) + c2(8, 4) then c. = —2 and cz = 1. Thus 

TA, 0) = —2(3, 2, 1) + (6, 5, 4) 
= (0, 1, 2). 


EXAMPLE 7. Let T be a linear transformation from the m-tuple space 
F” into the n-tuple space F”. Theorem 1 tells us that T is uniquely de- 
termined by the sequence of vectors (i, . . . , Bm where 
Bi = Te, a7=1,...,m. 


In short, T is uniquely determined by the images of the standard basis 
vectors. The determination is 


a = (£i... , Un) 
Ta = xßı + e+) F Embu. 
If B is the m X n matrix which has row vectors ĝi, . . . , Bm, this says that 
Ta = aB. 
In other words, if 6; = (Bn,..., Bin), then 
Ba nee Bu 
T(t, -o o, Zn) = [£1 t Em] A ; } 
Bm +e: Bun 


This is a very explicit description of the linear transformation. In Section 
3.4 we shall make a serious study of the relationship between linear trans- 
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formations and matrices. We shall not pursue the particular description 
Ta = aB because it has the matrix B on the right of the vector a, and that 
can lead to some confusion. The point of this example is to show that we 
can give an explicit and reasonably simple description of all linear trans- 
formations from F™ into F”, 

If T is a linear transformation from V into W, then the range of T is 
not only a subset of W; it is a subspace of W. Let Rr be the range of T, that 
is, the set of all vectors 8 in W such that 6 = Ta for some a in V. Let 6; 
and #, be in Rr and let c be a scalar. There are vectors a; and az in V such 
that Ta, = 8, and Tay = bə. Since T is linear 

T(co, + a2) = cTa, + Taz 
cB + Ba, 
which shows that cB, + b: is also in Rr. 

Another interesting subspace associated with the linear transformation 
T is the set N consisting of the vectors a in V such that Ta = 0. It is a 
subspace of V because 

(a) T(0) = 0, so that N is non-empty; 

(b) if Ta; = Ta: = 0, then 


T (coy + ae) 


ll 


cT a + Tas 
c0 +0 
0 


so that ca; + az is in N. 


Definition. Let V and W be vector spaces over the field F and let T 
be a linear transformation from V into W. The null space of T is the set 
of all vectors a in V such that Ta = 0. 

If V is finite-dimensional, the rank of T is the dimension of the range 
of T and the nullity of T ts the dimension of the null space of T. 


The following is one of the most important results in linear algebra. 


Theorem 2. Let V and W be vector spaces over the field F and let T be 
a linear transformation from V into W. Suppose that V is finite-dimensional. 
Then 
rank (T) + nullity (T) = dim V. 
Proof. Let {œ,..., aœ} be a basis for N, the null space of T. 
There are vectors ay41, . .., @ in V such that {a1,...,a,} is a basis for V. 
We shall now prove that {Tax41,..., Tan} is a basis for the range of T. 
The vectors Tau, ..., Ta, certainly span the range of T, and since Ta; = 0, 
for J < k, we see that Tarp, . . ., Ta, span the range. To see that these 
vectors are independent, suppose we have scalars c; such that 


cil Ta:) = 0. 


i=k+1 
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This says that 


r( S cios) =0 
t=k+1 


n 
and accordingly the vectora = Ð cia; is in the null space of T. Since 
i=k+1 
ai... , œx form a basis for N, there must be scalars bı, . . . , 6, such that 


k 
a= J biai 
i=l 


Thus 
k n 
È bai— D cya; =0 
i=l j=k+1 
and since æy . . ., æn are linearly independent we must have 


bi = ts = be = Op = = a O. 


If r is the rank of T, the fact that Tozyi,..., Tan form a basis for 
the range of T tells us that r = n — k. Since k is the nullity of T and n is 
the dimension of V, we are done. J 


Theorem 3. If A is an m X n matrix with entries in the field F, then 
row rank (A) = column rank (A). 


Proof. Let T be the linear transformation from F**! into F™} 
defined by T(X) = AX. The null space of T is the solution space for the 
system AX = 0, i.e., the set of all column matrices X such that AX = 0. 
The range of T is the set of all m X 1 column matrices Y such that AX = 
Y has a solution for X. If Ay,..., Án are the columns of A, then 


AX = %A1+ +++ + An 


so that the range of T is the subspace spanned by the columns of A. In 
other words, the range of T is the column space of A. Therefore, 


rank (T) = column rank (A). 


Theorem 2 tells us that if S is the solution space for the system AX = 0, 
then 
dim S + column rank (A) = n. 


We now refer to Example 15 of Chapter 2. Our deliberations there 
showed that, if r is the dimension of the row space of A, then the solution 
space S has a basis consisting of n — r vectors: 


dim S = n — row rank (A). 
It is now apparent that i 


row rank (A) = column rank (A). J 


The proof of Theorem 3 which we have just given depends upon 
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explicit calculations concerning systems of linear equations. There is a 
more conceptual proof which does not rely on such calculations. We shall 
give such a proof in Section 3.7. 


Exercises 


1. Which of the following functions T from R? into R? are linear transformations? 


(a) T(a, 22) = (l + ty 2); 

(b) T(x, 22) = (Tz, 21); 

(c) T(a1 £2) = (x3, 22); 

(d) T(t, 22) = (sin Ty 22); 

(e) T(X1, £2) = (11 — 2 0). 
2. Find the range, rank, null space, and nullity for the zero transformation and 
the identity transformation on a finite-dimensional space V. 


3. Describe the range and the null space for the differentiation transformation 
of Example 2. Do the same for the integration transformation of Example 5. 
4, Is there a linear transformation T from R? into R? such that T(1, —1, 1) = 
(1,0) and T(1, 1, 1) = (0,1)? 
5. If 
a, = (1, —1), By = (dl, 0) 
a2 = (2, —1), Bo = (0, 1) 
a; = (—3,2), 63 = (1,1) 
is there a linear transformation T from R? into R? such that Ta; = f; fort = 1, 2 
and 3? 


6. Describe explicitly (as in Exercises 1 and 2) the linear transformation T from 
F? into F? such that Te = (a, b), Te: = (c, d). 
7. Let F be a subfield of the complex numbers and let T be the function from 
F? into F° defined by 
T (x1, £2, £3) = (£1 — T2 + 223, 221 + X2, —X1 — Qa + 213). 


(a) Verify that T is a linear transformation. 

(b) If (a,b,c) is a vector in F?, what are the conditions on a, b, and c that 
the vector be in the range of T? What is the rank of T? 

(c) What are the conditions on a, b, and c that (a, b, c) be in the null space 
of T? What is the nullity of T? 


8. Describe explicitly a linear transformation from R? into R? which has as its 
range the subspace spanned by (1, 0, —1) and (1, 2, 2). 


9. Let V be the vector space of all n X n matrices over the field F, and let B 
be a fixed n X n matrix. If 


T(A) = AB — BA 
verify that T is a linear transformation from V into V. 


10. Let V be the set of all complex numbers regarded as a vector space over the 
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field of real numbers (usual operations). Find a function from V into V which is 
a linear transformation on the above vector space, but which is not a linear trans- 
formation on C?, i.e., which is not complex linear. 


11. Let V be the space of n X 1 matrices over F and let W be the space of m X 1 
matrices over F. Let A be a fixed m X n matrix over F and let T be the linear 
transformation from V into W defined by T(X) = AX. Prove that T is the zero 
transformation if and only if A is the zero matrix. 


12. Let V be an n-dimensional vector space over the field F and let T be a linear 
transformation from V into V such that the range and null space of T are identical. 
Prove that n is even. (Can you give an example of such a linear transformation T?) 


13. Let V be a vector space and T a linear transformation from V into V. Prove 
that the following two statements about T are equivalent. 

(a) The intersection of the range of T and the null space of T is the zero 
subspace of V. 

(b) If T(Ta) = 0, then Ta = 0. 


3.2. The Algebra of Linear Transformations 


In the study of linear transformations from V into W, it is of funda- 
mental importance that the set of these transformations inherits a natural 
vector space structure. The set of linear transformations from a space V 
into itself has even more algebraic structure, because ordinary composition 
of functions provides a ‘multiplication’ of such transformations. We shall 
explore these ideas in this section. 


Theorem 4. Let V and W be vector spaces over the field F. Let T and 
U be linear transformations from V into W. The function (T + U) defined by 


(T + U)(a) = Ta + Ua 


is a linear transformation from V into W. If c is any element of F, the function 
(cT) defined by 
(cT (a) = e(Ta) 


is a linear transformation from V into W. The set of all linear transformations 
from V into W, together with the addition and scalar multiplication defined 
above, is a vector space over the field F. 


Proof. Suppose T and U are linear transformations from V into 
W and that we define (T + U) as above. Then 


(T + U)(ca + 8) = T(ca + 8) + U(ca + 8) 

= c(Ta) + T8 + c(Ua) + UB 
c(Ta + Ua) + (T8 + UB) 

c(T + U)(a) + (T + U)(8) 


which shows that (T + U) is a linear transformation. Similarly, 
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(cT)(da + 8) = c[T(da + B)] 
c[d(Ta) + T8] 

= cd(Ta) + c(TB) 
d{c(Ta)] + c(T8) 
= d[(cT)a] + (cT)B 


which shows that (cT) is a linear transformation. 

To verify that the set of linear transformations of V into W (together 
with these operations) is a vector space, one must directly check each of 
the conditions on the vector addition and scalar multiplication. We leave 
the bulk of this to the reader, and content ourselves with this comment: 
The zero vector in this space will be the zero transformation, which sends 
every vector of V into the zero vector in W; each of the properties of the 
two operations follows from the corresponding property of the operations 
in the space W. J 


We should perhaps mention another way of looking at this theorem. 
If one defines sum and scalar multiple as we did above, then the set of 
all functions from V into W becomes a vector space over the field F. This 
has nothing to do with the fact that V is a vector space, only that V is a 
non-empty set. When V is a vector space we can define a linear transforma- 
tion from V into W, and Theorem 4 says that the linear transformations 
are a subspace of the space of all functions from V into W. 

We shall denote the space of linear transformations from V into W 
by L(V, W). We remind the reader that L(V, W) is defined only when V 
and W are vector spaces over the same field. 


Theorem 5. Let V be an n-dimensional vector space over the field F, 
and let W be an m-dimensional vector space over F. Then the space L(V, W) 
ts finite-dimensional and has dimension mn. 


Proof. Let 
G = {a1...,an} and @® = {fi,..., Bm} 


be ordered bases for V and W, respectively. For each pair of integers (p, q) 
with 1 <p < m and 1 < q < n, we define a linear transformation Æ?» 


from V into W by 
04 if tq 
By if t=q 


= 5iaBp- 


According to Theorem 1, there is a unique linear transformation from V 
into W satisfying these conditions. The claim is that the mn transforma- 
tions Æ”! form a basis for L(V, W). 

Let T be a linear transformation from V into W. Foreachj,1 <j < n, 


E? (ai) 


75 


76 


Linear Transformations Chap. 3 


let Aij,...,Amj be the coordinates of the vector Ta; in the ordered basis 
@’, i.e, 
m 
(3-1) Ta; = 2 A pjBp- 
ja 
We wish to show that 
(3-2) T= 5 D Apgh? 4, 
p=l1q=1 


Let U be the linear transformation in the right-hand member of (3-2). 
Then for each j 
Ua; = YZ Apgh?4(a;) 


P q 


= E E A paôjabp 
pP a 


2 A pißp 
p=1 
a Ta; 
and consequently U = T. Now (3-2) shows that the E”! span L(V, W); 


we must prove that they are independent. But this is clear from what 
we did above; for, if the transformation 


U = DD Ap? 
pq 
is the zero transformation, then Ua; = 0 for each J, so 
m 
Z App = 0 
p=1 
and the independence of the 8, implies that Ap; = 0 for every pand j. J 


Theorem 6. Let V, W, and Z be vector spaces over the field F. Let T 
be a linear transformation from V into W and U a linear transformation 
from W into Z. Then the composed function UT defined by (UT)(a) = 
U(T(@)) is a linear transformation from V into Z. 


Proof. 
(UT) (ca + 8) = U[T(ca + B)] 
U(cTa + TB) 
c[(U(Ta)] + U(T6) 
c(UT)(a) + (UT)(8). | 


ll 


ll 


In what follows, we shall be primarily concerned with linear trans- 
formation of a vector space into itself. Since we would so often have to 
write ‘T is a linear transformation from V into V, we shall replace this 
with ‘T is a linear operator on V.’ 


Definition. If V isa vector space over the field F, a linear operator on 
V is a linear transformation from V into V. 
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In the case of Theorem 6 when V = W = Z, so that U and T are 
linear operators on the space V, we see that the composition UT is again 
a linear operator on V. Thus the space L(V, V) has a ‘multiplication’ 
defined on it by composition. In this case the operator TU is also defined, 
and one should note that in general UT ¥ TU, i.e, UT — TU = 0. We 
should take special note of the fact that if T is a linear operator on V then 
we can compose T with T. We shall use the notation T? = TT, and in 
general T” = T .-- T (n times) for n = 1, 2,3,.... We define T° = I if 
T #0. 


Lemma. Let V be a vector space over the field F; let U, Ti and T be 
linear operators on V; let c be an element of F. 
(a) IU = UI = U; 
(b) U(T: + Tz) = UT, + UT; (Tı + T2)U = TiU + T2U; 
(ec) e(UT:) = (cU)T, = U (cT). 
Proof. (a) This property of the identity function is obvious. We 
have stated it here merely for emphasis. 
(b) [U(Ti + T2)](@) = U[(T: + T2)(e)] 
= U(Tia + T 2a) 
= U(T\a) + U(Tra) 
= (UT) (a) + (UT) (a) 
so that U(Tı + T2) = UT: + UT. Also 
(Ti + T2)U](a) = (Tı + T) (Ua) 
= T (Ua) + Ta(Ua) 
= (TıU) (a) + (T:U) (a) 
so that (Tı + T,)U = T,U + TU. (The reader may note that the proofs 
of these two distributive laws do not use the fact that T, and T; are linear, 
and the proof of the second one does not use the fact that U is linear either.) 
(c) We leave the proof of part (c) to the reader. J 


The contents of this lemma and a portion of Theorem 5 tell us that 
the vector space L(V, V), together with the composition operation, is 
what is known as a linear algebra with identity. We shall discuss this in 
Chapter 4. 


Examp.e 8. If A isan m X n matrix with entries in F, we have the 
linear transformation T defined by T(X) = AX, from F”x into F™!, If 
Bisa p X m matrix, we have the linear transformation U from F”*! into 
Fx! defined by U(Y) = BY. The composition UT is easily described: 

(UT)(X) = U(T(X)) 
U(AX) 
= B(AX) 
= (BA)X. 


Thus UT is ‘left multiplication by the product matrix BA.’ 
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Exampte 9. Let F be a field and V the vector space of all polynomial 
functions from F into F. Let D be the differentiation operator defined in 
Example 2, and let T be the linear operator ‘multiplication by 2’: 


(Tf)(x) = af(z). 
Then DT # TD. In fact, the reader should find it easy to verify that 
DT — TD = T, the identity operator. 
Even though the ‘multiplication’ we have on L(V, V) is not commu- 
tative, it is nicely related to the vector space operations of L(V, V). 


EXAMPLE 10. Let G = {a,..., &a} be an ordered basis for a vector 
space V. Consider the linear operators £4 which arose in the proof of 
Theorem 5: 

Er (ai) = Sighp. 
These n? linear operators form a basis for the space of linear operators on V. 
What is H?:7E"*? We have 
(EP 4B *) (aa) = E?-4(5;scer) 
Õis EP (ar) 


= ÕisðrqAp. 
Therefore, 
0, if r¥q 
Mp afres = } 
ENIE Ers, if q=r. 


Let T be a linear operator on V. We showed in the proof of Theorem 5 
that if 


A; = [Tajle 
A = [A1, , Aal] 
then 
T =ZDZVAp hr 
P Q 
If 


U = XE Bpen" 
is another linear operator on V, then the last lemma tells us that 
TU = (X Z AmE”) (ZY BH) 
P q T 8 
=2DYDL Å pBr BrE", 


p a r 8 


As we have noted, the only terms which survive in this huge sum are the 
terms where g = r, and since E?” Ers = Er, we have 


TU = DE (È A Bru) E?” 
ps r 
= DD (AB) ph, 
p 8 


Thus, the effect of composing T and U is to multiply the matrices A and B. 
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In our discussion of algebraic operations with linear transformations 
we have not yet said anything about invertibility. One specific question of 
interest is this. For which linear operators 7’ on the space V does there 
exist a linear operator T-! such that TT-! = TT = I? 

The function T from V into W is called invertible if there exists a 
function U from W into V such that UT is the identity function on V and 
TU is the identity function on W. If T is invertible, the function U is 
unique and is denoted by T-!. (See Appendix.) Furthermore, T is invertible 
if and only if 


1. Tis 1:1, that is, Ta = T8 implies a = 8; 
2. T is onto, that is, the range of T is (all of) W. 


Theorem 7. Let V and W be vector spaces over the field F and let T 
be a linear transformation from V into W. If T is invertible, then the inverse 
function T-! is a linear transformation from W onto V. 


Proof. We repeat ourselves in order to underscore a point. When 
T is one-one and onto, there is a uniquely determined inverse function T~ 
which maps W onto V such that T-T is the identity function on V, and 
TT-! is the identity function on W. What we are proving here is that if a 
linear function T is invertible, then the inverse T~ is also linear. 
Let 8; and bz be vectors in W and let c be a scalar. We wish to show 
that 


T(cBi + Bo) = cT—B; + To. 
Let a; = T—,, i = 1, 2, that is, let a; be the unique vector in V such that 
Ta; = Bi. Since T is linear, 
T (coy + a2) = cT ay; + Ta 
= cB, + fo. 


Thus ca; + a is the unique vector in V which is sent by T into cf: + 62, 
and so 


T— (cB; + Bo) = car + ae 
= c(T7'61) + TB. 


and T— is linear. J 


Suppose that we have an invertible linear transformation T from V 
onto W and an invertible linear transformation U from W onto Z. Then UT 
is invertible and (UT)! = T-!U-'!. That conclusion does not require the 
linearity nor does it involve checking separately that UT is 1:1 and onto. 
Allit involves is verifying that T-!U— is both a left and a right inverse for 
UT. 

If T is linear, then T(a — 8) = Ta — TB;hence, Ta = T if and only 
if T(a@ — 8) = 0. This simplifies enormously the verification that T is 1:1. 
Let us call a linear transformation T non-singular if Ty = 0 implies 
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y = 0, i.e., if the null space of T is {0}. Evidently, T is 1:1 if and only if T 
is non-singular. The extension of this remark is that non-singular linear 
transformations are those which preserve linear independence. 


Theorem 8. Let T be a linear transformation from V into W. Then 
T is non-singular if and only if T carries each linearly independent subset of 
V onto a linearly independent subset of W. 


Proof. First suppose that T is non-singular. Let S be a linearly 
independent subset of V. If m,..., a, are vectors in S, then the vectors 
Ta,..., Tox are linearly independent; for if 


&l(Ta) + + cx( Ton) =0 
then 
T(ciay + -+ Chan) = 0 


and since T is non-singular 
cia. + +++ + Cron = 0 


from which it follows that each c; = 0 because S is an independent set. 
This argument shows that the image of S under T is independent. 

Suppose that 7’ carries independent subsets onto independent subsets. 
Let @ be a non-zero vector in V. Then the set S consisting of the one vector 
a is independent. The image of SS is the set consisting of the one vector Ta, 
and this set is independent. Therefore Ta + 0, because the set consisting 
of the zero vector alone is dependent. This shows that the null space of T is 
the zero subspace, i.e., T is non-singular. J 


EXAMPLE 11. Let F be a subfield of the complex numbers (or a field of 
characteristic zero) and let V be the space of polynomial functions over F. 
Consider the differentiation operator D and the ‘multiplication by 2’ 
operator T, from Example 9. Since D sends all constants into 0, D is 
singular; however, V is not finite dimensional, the range of D is all of V, 
and it is possible to define a right inverse for D. For example, if E is the 
indefinite integral operator : 


Ela + exe + ++» + cunt) = ope + 5 0x2? +o + 


-A Cag”t! 

then £ is a linear operator on V and DE = I. On the other hand, HD # I 
because ED sends the constants into 0. The operator T isin what we might 
call the reverse situation. If 2f(x) = 0 for all z, then f = 0. Thus T is non- 
singular and it is possible to find a left inverse for T. For example if U is 
the operation ‘remove the constant term and divide by 2’: 


U(co + ae + ++ + ent") = a tee + es + cnt! 
then U is a linear operator on V and UT = I. But TU # I since every 
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function in the range of TU is in the range of T, which is the space of 
polynomial functions f such that f(0) = 0. 


EXAMPLE 12. Let F be a field and let T be the linear operator on F? 
defined by 
T (1, x2) = (a1 + 22, 2%). 


Then T is non-singular, because if T(x, x2) = 0 we have 


X+m=0 
Tı = 0 
so that xı = x: = 0. We also see that T is onto; for, let (a, z2) be any 
vector in F?. To show that (a, z2) is in the range of T we must find scalars 
xı and x2 such that 
Ti F tg = 3 
Ti = 2 
and the obvious solution is x; = 22, t = 2, — 2 This last computation 
gives us an explicit formula for T—!, namely, 


T (a, z2) = (22, a z2). 
We have seen in Example 11 that a linear transformation may be 
non-singular without being onto and may be onto without being non- 


singular. The present example illustrates an important case in which that 
cannot happen. 


Theorem 9. Let V and W be finite-dimensional vector spaces over the 
field F suchthat dim V = dim W. If T is a linear transformation from V into 
W, the following are equivalent: 


(i) T is invertible. 
(ii) T is non-singular. 
(iii) T is onto, that is, the range of T is W. 


Proof. Letn = dim V = dim W. From Theorem 2 we know that 
rank (T) + nullity (T) = n. 


Now T is non-singular if and only if nullity (T) = 0, and (since n = dim 
W) the range of T is W if and only if rank (T) = n. Since the rank plus the 
nullity is n, the nullity is 0 precisely when the rank is n. Therefore T is 
non-singular if and only if T(V) = W. So, if either condition (ii) or (iii) 
holds, the other is satisfied as well and T is invertible. J 


We caution the reader not to apply Theorem 9 except in the presence 
of finite-dimensionality and with dim V = dim W. Under the hypotheses 
of Theorem 9, the conditions (i), (ii), and (iii) are also equivalent to these. 

(iv) If {ai,..., an} is basis for V, then {Ta,..., Tan} is a basis for 
W. 
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(v) There is some basis {cu,..., &n} for V such that {Tai,..., Tan} 
ts a basis for W. 

We shall give a proof of the equivalence of the five conditions which 
contains a different proof that (i), (ii), and (iii) are equivalent. 

(i) > (ii). If T is invertible, T is non-singular. (ii) > (iii). Suppose 
T is non-singular. Let {ai,..., an} be a basis for V. By Theorem 8, 
{To,..., Ton} is a linearly independent set of vectors in W, and since 
the dimension of W is also n, this set of vectors is a basis for W. Now let 8 
be any vector in W. There are scalars c,..., Cn such that 


B = a(Tor) + +++ + en(Ton) 
SE. T (cia + ae + CnOn) 


which shows that £ is in the range of T. (ili) > (iv). We now assume that 
T is onto. If {a1,..., an} is any basis for V, the vectors Ta,..., Tan 
span the range of T, which is all of W by assumption. Since the dimension 
of W isn, thesenvectorsmust be linearly independent, that is, must comprise 
a basis for W. (iv) > (v). This requires no comment. (v) > (i). Suppose 
there is some basis fa1,..., an} for V such that {Tm,..., Tan} is a 
basis for W. Since the Ta; span W, it is clear that the range of T is all of W. 
If a = cia, + +++ + Cada is in thenull space of T, then 


T (ca + i CnOn) =0 
or 
c1(T'a1) +--+ Ca( Ton) =0 


and since the Ta; are independent each c; = 0, and thus a = 0. We have 
shown that the range of T is W, and that T is non-singular, hence T is 
invertible. 

The set of invertible linear operators on a space V, with the operation 
of composition, provides a nice example of what is known in algebra as 
a ‘group.’ Although we shall not have time to discuss groups in any detail, 
we shall at least give the definition. 


Definition. A group consists of the following. 


1. A set G; 
2. A rule (or operation) which associates with each pair of elements x, 
y in G an element xy in G in such a way that 
(a) x(yz) = (xy)z, for all x, y, and z in G (associativity); 
(b) thereis an element ein G such that ex = xe = x, for every x in G; 
(c) to each element x in G there corresponds an element x~! in G such 
that xx-! = xx = e. 


We have seen that composition (U, T) > UT associates with each 
pair of invertible linear operators on a space V another invertible operator 
on V. Composition is an associative operation. The identity operator I 
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satisfies ZT = TJ for each T, and for an invertible T there is (by Theorem 
7) an invertible linear operator T~ such that T7'-! = T-T = I. Thus the 
set of invertible linear operators on V, together with this operation, is a 
group. The set of invertible n X n matrices with matrix multiplica- 
tion as the operation is another example of a group. A group is called 
commutative if it satisfies the condition ry = yx for each x and y. The 
two examples we gave above are not commutative groups, in general. One 
often writes the operation in a commutative group as (z, y) >x +y, 
rather than (x, y) xy, and then uses the symbol 0 for the ‘identity’ 
element e. The set of vectors in a vector space, together with the operation 
of vector addition, is a commutative group. A field can be described as a 
set with two operations, called addition and multiplication, which is a 
commutative group under addition, and in which the non-zero elements 
form a commutative group under multiplication, with the distributive 
law z(y + z) = xy + zz holding. 


Exercises 


l. Let T and U be the linear operators on R? defined by 
T (21, t) = (42,21) and U(x, £2) = (z, 0). 


(a) How would you describe T and U geometrically? 
(b) Give rules like the ones defining T and U for each of the transformations 
(U + T), UT, TU, T?, U2. 


2. Let T be the (unique) linear operator on Cè for which 
Te, = (1, 0, 2), Te. = (0, 1, 1), Te; = (i, 1,0). 

Is T invertible? 

3. Let T be the linear operator on R? defined by 

T (21, U2, 3) = (8x1, Tı — Le, 2x1 + axe + z3). 

Is T invertible? If so, find a rule for T~! like the one which defines T. 

4. For the linear operator T of Exercise 3, prove that 

(T? — I(T — 31) = 0. 


5. Let C?*? be the complex vector space of 2 X 2 matrices with complex entries. 


Let 
1 =l 
ae E j 


and let T be the linear operator on C%? defined by T(4) = BA. What is the 
rank of T? Can you describe T2? 


6. Let T be a linear transformation from R? into R?, and let U be a linear trans- 
formation from R? into R’. Prove that the transformation UT is not invertible. 
Generalize the theorem. 


84 


Linear Transformations Chap. 3 


7. Find two linear operators T and U on R? such that TU = 0 but UT #0. 


8. Let V bea vector space over the field F and T a linear operator on V. If T? = 0, 
what can you say about the relation of the range of T to the null space of T? 
Give an example of a linear operator T on R? such that T? = 0 but T # 0. 


9. Let T be a linear operator on the finite-dimensional space V. Suppose there 
is a linear operator U on V such that TU = I. Prove that T is invertible and 
U = T~, Give an example which shows that this is false when V is not finite- 
dimensional. (Hint: Let T = D, the differentiation operator on the space of poly- 
nomial functions.) 


10. Let A bean m X n matrix with entries in F and let T be the linear transforma- 
tion from F”*! into F”X! defined by T(X) = AX. Show that if m < n it may 
happen that T is onto without being non-singular. Similarly, show that if m > n 
we may have T non-singular but not onto. 


11. Let V be a finite-dimensional vector space and let T be a linear operator on V. 
Suppose that rank (T?) = rank (T). Prove that the range and null space of T are 
disjoint, i.e., have only the zero vector in common. 


12. Let p, m, and n be positive integers and F a field. Let V be the space of m X n 
matrices over F and W the space of p X n matrices over F. Let B be a fixed p X m 
matrix and let T be the linear transformation from V into W defined by 
T(A) = BA. Prove that T is invertible if and only if p = m and B is an invertible 
m X m matrix. 


3.3. Isomorphism 


If V and W are vector spaces over the field F, any one-one linear 
transformation T of V onto W is called an isomorphism of V onto W. 
If there exists an isomorphism of V onto W, we say that V is isomorphic 
to W. 

Note that V is trivially isomorphic to V, the identity operator being 
an isomorphism of V onto V. Also, if V is isomorphic to W via an iso- 
morphism T, then W is isomorphic to V, because T-! is an isomorphism 
of W onto V. The reader should find it easy to verify that if V is iso- 
morphic to W and W is isomorphic to Z, then V is isomorphic to Z. Briefly, 
isomorphism is an equivalence relation on the class of vector spaces. If 
there exists an isomorphism of V onto W, we may sometimes say that V 
and W are isomorphic, rather than V is isomorphic to W. This will cause 
no confusion because V is isomorphic to W if and only if W is isomorphic 
to V. 


Theorem 10. Every n-dimensional vector space over the field F is iso- 
morphic to the space F". 


Proof. Let V be an n-dimensional space over the field F and let 
@ = {a,...,@n} be an ordered basis for V. We define a function T 
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from V into F”, as follows: If æ is in V, let Ta be the n-tuple (1, . . . , tn) 
of coordinates of œ relative to the ordered basis @, i.e., the -tuple such 
that 

a= Ma, + +++ F Tran. 


In our discussion of coordinates in Chapter 2, we verified that this T is 
linear, one-one, and maps V onto F”. J 


For many purposes one often regards isomorphic vector spaces as 
being ‘the same,’ although the vectors and operations in the spaces may 
be quite different, that is, one often identifies isomorphic spaces. We 
shall not attempt a lengthy discussion of this idea at present but shall 
let the understanding of isomorphism and the sense in which isomorphic 
spaces are ‘the same’ grow as we continue our study of vector spaces. 

We shall make a few brief comments. Suppose T is an isomorphism 
of V onto W. If S is a subset of V, then Theorem 8 tells us that S is linearly 
independent if and only if the set T(S) in W is independent. Thus in 
deciding whether S is independent it doesn’t matter whether we look at S 
or T(S). From this one sees that an isomorphism is ‘dimension preserving,’ 
that is, any finite-dimensional subspace of V has the same dimension as its 
image under T. Here is a very simple illustration of this idea. Suppose A 
is an m X n matrix over the field F. We have really given two definitions 
of the solution space of the matrix A. The first is the set of all n-tuples 
(t1,..., £a) in F” which satisfy each of the equations in the system AX = 
0. The second is the set of alln X 1 column matrices X such that AX = 0. 
The first solution space is thus a subspace of F” and the second is a subspace 
of the space of all n X 1 matrices over F. Now there is a completely 
obvious isomorphism between F” and F”*!, namely, 


Ty 
(a1, . : y En) > 

Tn 
Under this isomorphism, the first solution space of A is carried onto the 
second solution space. These spaces have the same dimension, and so 
if we want to prove a theorem about the dimension of the solution space, 
it is immaterial which space we choose to discuss. In fact, the reader 
would probably not balk if we chose to identify F” and the space of n X 1 
matrices. We may do this when it is convenient, and when it is not con- 
venient we shall not. 


Exercises 


1. Let V be the set of complex numbers and let F be the field of real numbers. 
With the usual operations, V is a vector space over F. Bescribe explicitly an iso- 
morphism of this space onto R?. 
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2. Let V be a vector space over the field of complex numbers, and suppose there 
is an isomorphism T of V onto C’. Let ai, a2, a3, as be vectors in V such that 
Ta, = (1,0,0), Ta = (—2, 1 + 4,0), 
Tas = (—1,1,1) Tar = (V2, i 3). 
(a) Is a, in the subspace spanned by az and a3? 
(b) Let W, be the subspace spanned by a; and a», and let W, be the subspace 
spanned by a3 and ay. What is the intersection of W, and W:2? 
(c) Find a basis for the subspace of V spanned by the four vectors a;. 

3. Let W be the set of all 2 X 2 complex Hermitian matrices, that is, the set 
of 2 X 2 complex matrices A such that A; = A, (the bar denoting complex 
conjugation). As we pointed out in Example 6 of Chapter 2, W is a vector space 
over the field of real numbers, under the usual operations. Verify that 


t+ aoe 


y—tz t—2 


(z, Y, 2, t) on [ 
is an isomorphism of R4 onto W. 
4. Show that F”* is isomorphic to fF. 
5. Let V be the set of complex numbers regarded as a vector space over the 


field of real numbers (Exercise 1). We define a function T from V into the space 
of 2 X 2 real matrices, as follows. If z = 2 + ty with x and y real numbers, then 


_ Petty Sy |. 
Te) =[" "oy oa 


(a) Verify that T is a one-one (real) linear transformation of V into the 
space of 2 X 2 real matrices. 

(b) Verify that T(2,22) = T(21)T (22). 

(c) How would you describe the range of T? 


6. Let V and W be finite-dimensional vector spaces over the field F. Prove that 
V and W are isomorphic if and only if dim V = dim W. 


7. Let V and W be vector spaces over the field F and let U be an isomorphism 
of V onto W. Prove that T —> UTU~' is an isomorphism of L(V, V) onto L(W, W). 


3.4. Representation of Transformations 
by Matrices 


Let V be an n-dimensional vector space over the field F and let W 
be an m-dimensional vector space over F. Let ® = {m,...,an} be an 
ordered basis for V and @’ = {(@1,..., Bm} an ordered basis for W. If T 
is any linear transformation from V into W, then T is determined by its 
action on the vectors a;. Each of the n vectors Ta; is uniquely expressible 
as a linear combination 


(3-3) Ta; = $ Asbi 


t=] 


Sec. 3.4 Representation of Transformations by Matrices 


of the 6; the scalars Ai;,..., Am; being the coordinates of Ta; in the 
ordered basis @’. Accordingly, the transformation T is determined by 
the mn scalars A;; via the formulas (3-3). The m X n matrix A defined 
by A(i, j) = Ay is called the matrix of T relative to the pair of ordered 
bases ® and @’. Our immediate task is to understand explicitly how 
the matrix A determines the linear transformation 7’. 

If a = ta, + +++ + Tran is a vector in V, then 


T (3 zya) 
j=1 


Ta 


ii ii 
TMe TM 
$o a 
ive 5 
> wa 
D 


li 
Ms 


( > Asz) ĝi- 
i=l \jel 


If X is the coordinate matrix of œ in the ordered basis ®, then the com- 
putation above shows that A X is the coordinate matrix of the vector Ta 
in the ordered basis 8’, because the scalar 


n 
D Aijtj 
j=1 


is the entry in the 7th row of the column matrix AX. Let us also observe 
that if A is any m X n matrix over the field F, then 


(3-4) p2 sja) = È (3 Ass) Bi 
j=l t=1 \j=1 


defines a linear transformation T from V into W, the matrix of which is 
A, relative to ®, 6’. We summarize formally: 


Theorem 11. Let V be an n-dimensional vector space over the field F 
and W an m-dimensional vector space over F. Let 0&3 be an ordered basis for 
V and &’ an ordered basis for W. For each linear transformation T from V 
into W, there is an m X n matrix A with entries in F such that 


[Ta] = Aloe 


for every vector a in V. Furthermore, T > A is a one-one correspondence 
between the set of all linear transformations from V into W and the set of 
allm X n matrices over the field F. 


The matrix A which is associated with T in Theorem 11 is called the 
matrix of T relative to the ordered bases (8, @’. Note that Equation 
(3-3) says that A is the matrix whose columns A,,..., A, are given by 


A; = [Tajle, jg=41,...,n. 


87 


88 


Linear Transformations Chap. 3 


If U is another linear transformation from V into W and B = [Bi,..., Ba] 
is the matrix of U relative to the ordered bases @, @’ then cA + B is the 
matrix of cT + U relative to ®, @’. That is clear because 


cA; + B; = c[Ta;]æ + [Uajle 
= [cTa; + Ua; 
= [(cT + U)a;le. 


Theorem 12. Let V be an n-dimensional vector space over the field F 
and let W be an m-dimensional vector space over F. For each pair of ordered 
bases Q, 8’ for V and W respectively, the function which assigns to a linear 
transformation T its matrix relative to @, Œ is an isomorphism between the 
space L(V, W) and the space of all m X n matrices over the field F. 


Proof. We observed above that the function in question is linear, 
and as stated in Theorem 11, this function is one-one and maps L(V, W) 
onto the set of m X n matrices. J 


We shall be particularly interested in the representation by matrices 
of linear transformations of a space into itself, i.e., linear operators on a 
space V. In this case it is most convenient to use the same ordered basis 
in each case, that is, to take @ = @’. We shall then call the representing 
matrix simply the matrix of T relative to the ordered basis &. Since 
this concept will be so important to us, we shall review its definition. If T 
is a linear operator on the finite-dimensional vector space V and ® = 
{æi . . . , æn} is an ordered basis for V, the matrix of T relative to @ (or, the 
matrix of T in the ordered basis @) is the n X n matrix A whose entries 
A,; are defined by the equations 


(8-5) Ta; = > Aijai, j = 1, eeN 
i=l 


One must always remember that this matrix representing T depends upon 
the ordered basis ®, and that there is a representing matrix for T in each 
ordered basis for V. (For transformations of one space into another the 
matrix depends upon two ordered bases, one for V and one for W.) In order 
that we shall not forget this dependence, we shall use the notation 


[T]e 


for the matrix of the linear operator T in the ordered basis ®. The manner 
in which this matrix and the ordered basis describe T is that for each a in V 


[Tale = [Tlelele. 


EXAMPLE 13. Let V be the space of n X 1 column matrices over the 
field F; let W be the space of m X 1 matrices over F; and let A be a fixed 
m X n matrix over F. Let T be the linear transformation of V into W 
defined by T(X) = AX. Let ® be the ordered basis for V analogous to the 
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standard basis in F”, i.e., the 7th vector in @ in the n X 1 matrix X; with 
a 1 in row 7 and all other entries 0. Let @’ be the corresponding ordered 
basis for W, i.e., the jth vector in @’ is the m X 1 matrix Y; with a 1 in row 
j and all other entries 0. Then the matrix of T relative to the pair ®, 8’ is 
the matrix A itself. This is clear because the matrix AX; is the jth column 
of A. 


EXxamMPLe 14. Let F bea field and let T be the operator on F? defined by 
T (a1, x2) = (a1, 0). 


It is easy to see that T is a linear operator on F?. Let @ be the standard 
ordered basis for F?, ® = {en €}. Now 


Te = T(1, 0) = (1,0) = le, + Oez 
Tez ae. T(0, 1) = (0, 0) = Oe + Oez 


so the matrix of T in the ordered basis @ is 


T= fo of 


EXampPLeE 15. Let V be the space of all polynomial functions from R 
into R of the form 


f(x) = co + cx + cox? + cs? 

that is, the space of polynomial functions of degree three or less. The 
differentiation operator D of Example 2 maps V into V, since D is ‘degree 
decreasing.’ Let @ be the ordered basis for V consisting of the four functions 
fi, fo fs, fa defined by f(x) = xi. Then 

(Dfi)(z) = 0, Df, = Of: + Of2 + Ofs + Ofa 

(Dfe)(z) = 1, Dfe = fy + Ofo + Ofs + Ofa 

(Dfs)(x) = 2x, Dfs = Of: + 2f2 + Ofs + Ofa 

(Dfs)(z) = 3z, Dfa = Of: + Ofe + 3fs + Ofa 
so that the matrix of D in the ordered basis @ is 
0 


010 

00 2 0 
Dlg = i 
Ple=|9 0 0 3 

000 0 

We have seen what happens to representing matrices when transfor- 

mations are added, namely, that the matrices add. We should now like 
to ask what happens when we compose transformations. More specifically, 
let V, W, and Z be vector spaces over the field F of respective dimensions 
n, m, and p. Let T be a linear transformation from V into W and U a linear 
transformation from W into Z. Suppose we have ordered bases 


® = {a1,..., Qn}, @’ = {B1,---, Bm, B” = {r1,..-5 Ye} 
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for the respective spaces V, W, and Z. Let A be the matrix of T relative 
to the pair ®, & and let B be the matrix of U relative to the pair 8’, 8”. 
It is then easy to see that the matrix C of the transformation UT relative 
to the pair ®, 8” is the product of B and A; for, if œ is any vector in V 


[Tale = Alale 
[U(Ta)]q = B[T alg 
and so 
[(UT)(a) la = BA[a]g 


and hence, by the definition and uniqueness of the representing matrix, 
we must have C = BA. One can also see this by carrying out the computa- 
tion 


(UT) (a;i) = U(Ta;) 


il 


$ Ayj( UB) 
k=l 


m P 
= D Ar D Biyi 
ka i=l 


p m 
È (3 Badu) Yi 
so that we must have 


(3-6) Cy = 2 Buda 


We motivated the definition (3-6) of matrix multiplication via operations 
on the rows of a matrix. One sees here that a very strong motivation for 
the definition is to be found in composing linear transformations. Let us 
summarize formally. 


Theorem 13. Let V, W, and Z be finite-dimensional vector spaces over 
the field F; let T be a linear transformation from V into W and U a linear 
transformation from W into Z. If @, 8’, and @” are ordered bases for the 
spaces V, W, and Z, respectively, if A is the matrix of T relative to the pair 
®, &’, and B is the matrix of U relative to the pair @', @”, then the matrix 
of the composition UT relative to the pair Q, 8” is the product matrix C = BA. 


We remark that Theorem 13 gives a proof that matrix multiplication 
is associative-—a proof which requires no calculations and is independent 
of the proof we gave in Chapter 1. We should also point out that we proved 
a special case of Theorem 13 in Example 12. 

It is important to note that if T and U are linear operators on a 
space V and we are representing by a single ordered basis ®, then Theorem 
13 assumes the simple form [UT]g = [U]e[T]e. Thus in this case, the 
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correspondence which @ determines between operators and matrices is not 
only a vector space isomorphism but also preserves products. A simple 
consequence of this is that the linear operator T is invertible if and only if 
[T]e is an invertible matrix. For, the identity operator I is represented by 
the identity matrix in any ordered basis, and thus 
UT =TU =I 

is equivalent to 

(VlelT]e = [T]lelU]e = I. 
Of course, when T is invertible 


[T~] = [T]e’. 


Now we should like to inquire what happens to representing matrices 
when the ordered basis is changed. For the sake of simplicity, we shall 
consider this question only for linear operators on a space V, so that we 
can use a single ordered basis. The specific question is this. Let T be a 
linear operator on the finite-dimensional space V, and let 


G = {a,...,an} and @8' = {at,..., an} 
be two ordered bases for V. How are the matrices [T']g and [T']g related? 


As we observed in Chapter 2, there is a unique (invertible) n X n matrix P 
such that 


(3-7) lela = Piele 

for every vector a in V. It is the matrix P = [Pi,..., Pa] where P; = 
Laj]g. By definition 

(3-8) [Tale = [T]lelals. 

Applying (8-7) to the vector Ta, we have 

(3-9) (Tale = P[Tale. 


Combining (8-7), (3-8), and (3-9), we obtain 
(T]ePlele = P[Talg: 


or 
P-"[T]ePlale = [Tale 


and so it must be that 
(3-10) [T]e = P3[T]eP. 


This answers our question. 
Before stating this result formally, let us observe the following. There 
is a unique linear operator U which carries 63 onto @’, defined by 


, ; 
Uaj; = an gHil,...,n 


This operator U is invertible since it carries a basis for V onto a basis for 
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V. The matrix P (above) is precisely the matrix of the operator U in the 
ordered basis ®. For, P is defined by 


aj = D Piai 
i=l 
and since Ua; = a}, this equation can be written 
n 
Ua; = 2 Pija. 
=1 
So P = [U]g, by definition, 


Theorem 14. Let V be a finite-dimensional vector space over the field F, 
and let 


G = {a,...,an} and GB! = {aji,..., an} 


be ordered bases for V. Suppose T is a linear operator on V. If P = [Pi,. . 
Pa] is then X n matriz with columns P; = [aj]e, then 


[Tle = P-[T]eP. 
Alternatively, if U is the invertible operator on V defined by Uaj = aj, j = 
1,...,n, then 
[T]e = [U]é*[T]alU]a. 


EXAMPLE 16. Let T be the linear operator on R? defined by T(x, 22) = 
(a, 0). In Example 14 we showed that the matrix of T in the standard 


ordered basis ® = {€, €} is 
1 0 
[To = È aI 


Suppose @’ is the ordered basis for R? consisting of the vectors ei = (1, 1), 
e = (2, 1). Then 


7) 


a = ate 


6 = 2a + & 


-ii 
m-f i] 
[Tle = P[T]eP 
A 
2 


[1 ol 


so that P is the matrix 


By a short computation 


Thus 


LUN 
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We can easily check that this is correct because 
Ta = (1,0 = -at+ g 
Te, = (2,0) = —2et + 23. 


EXAMPLE 17. Let V be the space of polynomial functions from R into 
R which have ‘degree’ less than or equal to 3. As in Example 15, let D be 
the differentiation operator on V, and let 


B= {fi fe, fa, fa} 
be the ordered basis for V defined by fi(x) = x*!. Let t be a real number 
and define g,(z) = (« + t)*-!, that is 


gı =fi 
g2 = ite 
gs = Uf, + 2tf2 + fs 


ga = Efi + Bef + 3tfs + fa. 


Since the matrix 


1 ¢ #? @ 
0 1 2 3¢ 
P=lo 0 1. g 
00 0 1 
is easily seen to be invertible with 
1 -t Ë ë 
a |O 1 —2¢ 3t? 
0 0 1 —3t 
o 0 0 1 


it follows that @’ = {g1, 92, 93, gs} is an ordered basis for V. In Example 15, 
we found that the matrix of D in the ordered basis @ is 


0100 
002 0 
Dla=|9 9 0 3 
0000 
The matrix of D in the ordered basis @’ is thus 
1 —t t ao 1 0 NM ¢ e @ 
T 0 1 —2t 321]0 0 2 O|]0 1 % 3e 
P-\[D]eP = | 9 0 Site 00 1 3t 
0 0 0 1jJL0 0 o ojlo 0 0o 1 
1 -t t eO 1 2t 32 
Jo 1 —2t 32110 0 2 & 
~10 0 1 -x |}0 0 0 8 
0 O 0 1J/L0 0 0 0 
0100 
_|0 0 2 0 
~ 10 00 37 
0000 
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Thus D is represented by the same matrix in the ordered bases @ and @’. 
Of course, one can see this somewhat more directly since 


Dg, = 0 
Dg: = gı 
Dg; = 292 
Dg, = 393. 


This example illustrates a good point. If one knows the matrix of a linear 
operator in some ordered basis @ and wishes to find the matrix in another 
ordered basis @’, it is often most convenient to perform the coordinate 
change using the invertible matrix P; however, it may be a much simpler 
task to find the representing matrix by a direct appeal to its definition. 


Definition. Let A and B be n X n (square) matrices over the field F. 
We say that B is similar to A over F if there is an invertible n X n matrix 
P over F such that B = PAP. 


According to Theorem 14, we have the following: If V is an n-dimen- 
sional vector space over F and @ and @®’ are two ordered bases for V, 
then for each linear operator T on V the matrix B = [T]g is similar to 
the matrix A = [T]g. The argument also goes in the other direction. 
Suppose A and B are n X n matrices and that B is similar to A. Let 
V be any n-dimensional space over F and let @ be an ordered basis for V. 
Let T be the linear operator on V which is represented in the basis @ by 
A. If B = PAP, let @’ be the ordered basis for V obtained from 6 by P, 
i.e., 


n 
, 
a= > P ijai. 
i=] 


Then the matrix of T in the ordered basis @’ will be B. 

Thus the statement that B is similar to A means that on each n- 
dimensional space over F the matrices A and B represent the same linear 
transformation in two (possibly) different ordered bases. 

Note that each n X n matrix A is similar to itself, using P = J; if 
B is similar to A, then A is similar to B, for B = P-!AP implies that 
A = (P-')-!BP-!; if B is similar to A and C is similar to B, then C is similar 
to A, for B = P™AP and C = Q-'!BQ imply that C = (PQ)'A(PQ). 
Thus, similarity is an equivalence relation on the set of n X n matrices 
over the field F. Also note that the only matrix similar to the identity 
matrix I is Z itself, and that the only matrix similar to the zero matrix is 
the zero matrix itself. 
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Exercises 


l. Let T be the linear operator on C? defined by T(%, 22) = (a, 0). Let ® be 
the standard ordered basis for C? and let @’ = {a1, a2} be the ordered basis defined 
by a= 4, i), a= (=i, 2). 

(a) What is the matrix of T relative to the pair @, @’? 
(b) What is the matrix of T relative to the pair @’, @? 
(c) What is the matrix of T in the ordered basis @’? 

(d) What is the matrix of T in the ordered basis {a2, ai}? 


2. Let T be the linear transformation from R? into R? defined by 
T(x, Ta; Za) = (x1 + T2, 2x; — zı). 


(a) If @ is the standard ordered basis for R? and @’ is the standard ordered 
basis for R?, what is the matrix of T relative to the pair ®, @”? 
(b) If G = {ay, a2, a3} and @' = {81, B2}, where 


ay = (1, 0, —1), a. = (1,1, 1), a3 = (1, 0, 0); Bi oe (0, 1), Be = (1, 0) 
what is the matrix of T relative to the pair ®, (8/? 


3. Let T be a linear operator on F”, let A be the matrix of T in the standard 
ordered basis for F”, and let W be the subspace of F” spanned by the column 
vectors of A. What does W have to do with T? 


4. Let V be a two-dimensional vector space over the field F, and let @ be an 
ordered basis for V. If T is a linear operator on V and 


Te- [e 3 


prove that T? — (a+ d)T + (ad — bc)I = 0. 


5. Let T be the linear operator on R’, the matrix of which in the standard ordered 


basis is 
1 2 1 
A= | 01 i} 
—-1 3 4 


Find a basis for the range of T and a basis for the null space of T. 
6. Let T be the linear operator on R? defined by 
T (a1, z2) = (— T3 T1). 


(a) What is the matrix of T in the standard ordered basis for R?? 
(b) What is the matrix of T in the ordered basis G = {a, a}, wherea,; = (1, 2) 
and a, = (1, —1)? 


(c) Prove that for every real number ec the operator (T — cI) is invertible. 
(d) Prove that if G is any ordered basis for R? and [T]g = A, then AyAn + 0. 


7. Let T be the linear operator on Rè defined by 
T(r, T2, T3) = (821 + T3, — 22x, + T2, — 2X1 + 2r: + 4x3). 
(a) What is the matrix of T in the standard ordered basis for R?? 
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(b) What is the matrix of T in the ordered basis 
{ox1, æ, æa} 
where qa = (1, 0, 1), ag = (—1, 2, 1), and as = (2, 1, 1)? 


(c) Prove that T is invertible and give a rule for T~! like the one which de- 
fines T. 


8. Let 0 be a real number. Prove that the following two matrices are similar 
over the field of complex numbers: 


i 0 —sin At ie. 0 ] 
sin@  cosð 0 ee 

(Hint: Let T be the linear eperator on C? which is represented by the first matrix 
in the standard ordered basis. Then find vectors a1 and a such that To, = eau, 
Ta: = eta, and {a4, a} is a basis.) 

9. Let V be a finite-dimensional vector space over the field F and let S and T 
be linear operators on V. We ask: When do there exist ordered bases ® and @’ 
for V such that [S]e = [T]e? Prove that such bases exist if and only if there is 
an invertible linear operator U on V such that T = USU~, (Outline of proof: 
If (Sle = [T]e’, let U be the operator which carries @ onto @’ and show that 


S = UTU—, Conversely, if T = USU-! for some invertible U, let @ be any 
ordered basis for V and let @’ be its image under U. Then show that [S]e = [T]a-.) 


10. We have seen that the linear operator T on R? defined by T(x, x2) = (x1, 0) 
is represented in the standard ordered basis by the matrix 


1 0 
A= E of 
This operator satisfies T? = T. Prove that if S is a linear operator on R? such that 


S? = 8, then S = 0, or S = J, or there is an ordered basis @ for R? such that 
[Sle = A (above). 


11. Let W be the space of all n X 1 column matrices over a field F. If A is an 
n X n matrix over F, then A defines a linear operator La on W through left 
multiplication: L4(X) = AX. Prove that every linear operator on W is left multi- 
plication by some n X n matrix, i.e., is La for some A. 

Now suppose V is an n-dimensional vector space over the field F, and let ® 
be an ordered basis for V. For each « in V, define Ua = [a]g. Prove that U is an 
isomorphism of V onto W. If T is a linear operator on V, then UTU! is a linear 
operator on W. Accordingly, UTU~1 is left multiplication by some n X n matrix A. 
What is A? 


12. Let V be an n-dimensional vector space over the field F, and let ® = 
{a1,...,@n} be an ordered basis for V. 
(a) According to Theorem 1, there is a unique linear operator T on V such that 


kg 


Ta; = Qj j=l1,...,n—-1, Ta, = 0. 


What is the matrix A of T in the ordered basis @? 

(b) Prove that T* = 0 but T*“! = 0. 

(c) Let S be any linear operator on V such that S* = 0 but S*“ + 0. Prove 
that there is an ordered basis @’ for V such that the matrix of S in the ordered 
basis @’ is the matrix A of part (a). 
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(d) Prove that if M and N are n X n matrices over F such that M” = N” = 
but M”! < 0 = N*~!, then M and N are similar. 


13. Let V and W be finite-dimensional vector spaces over the field F and let T 
be a linear transformation from V into W. If 


G = {a1,...,Qn} and @' = {B,...,Bmp 
are ordered bases for V and W, respectively, define the linear transformations E72 


as in the proof of Theorem 5: E™%(a;) = ibp. Then the Era, 1 <p <m, 
1 < q < n, forma basis for L(V, W), and so 


m n 

T= È È Apghrt 
p=1lq=l 

for certain scalars Ap, (the coordinates of T in this basis for L(V, W)). Show that 

the matrix A with entries A(p, q) = Ap is precisely the matrix of T relative to 

the pair @, @’. 
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If V is a vector space over the field F, a linear transformation f from V 
into the scalar field F is also called a linear functional on V. If we start 
from scratch, this means that f is a function from V into F such that 


f(ca + B) = fla) + f6) 


for all vectors a and @ in V and all scalars c in F. The concept of linear 
functional is important in the study of finite-dimensional spaces because 
it helps to organize and clarify the discussion of subspaces, linear equations, 
and coordinates. 


Example 18. Let F bea field and let a;,..., a, bescalarsin F. Define 
a function f on F” by 


f(t...) En) = r+ ii + antn. 


Then f is a linear functional on F”. It is the linear functional which is 
represented by the matrix [a ---a,] relative to the standard ordered 
basis for F” and the basis {1} for F: 


a; = f(a), jJ=l,...,n. 
Every linear functional on F” is of this form, for some scalars di,... , Qn. 
That is immediate from the definition of linear functional because we define 
a; = f(e;) and use the linearity 


A flu... En) =f (z 2,6) 
= z zif le) 


= = Ajj. 
j 
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EXxamp_Le 19. Here is an important example of a linear functional. 
Let n be a positive integer and F a field. If A is an n X n matrix with 
entries in F, the trace of A is the scalar 


trA = An + Ag + see + Ann 


The trace function is a linear functional on the matrix space F°*" because 


tr (cA + B) = z (cAi + Bis) 


Cc > Au + S Bii 
i=1 i=] 
=ctrA+trB. 


EXAmpLeE 20. Let V be the space of all polynomial functions from the 
field F into itself. Let ¢ be an element of F. If we define 


Li(p) = ptt) 
then J, is a linear functional on V. One usually describes this by saying 
that, for each ¢, ‘evaluation at t is a linear functional on the space of 
polynomial functions. Perhaps we should remark that the fact that the 
functions are polynomials plays no role in this example. Evaluation at t 
is a linear functional on the space of all functions from F into F. 


EXAMPLE 21. This may be the most important linear functional in 
mathematics. Let [a, b] be a closed interval on the real line and let C({a, b]) 
be the space of continuous real-valued functions on [a, b]. Then 


L(g) = [? 9 at 


defines a linear functional L on C([a, b]). 

If V is a vector space, the collection of all linear functionals on V 
forms a vector space in a natural way. It is the space L(V, F). We denote 
this space by V* and call it the dual space of V: 


V* = L(V, F). 


If V is finite-dimensional, we can obtain a rather explicit description 
of the dual space V*. From Theorem 5 we know something about the 
space V*, namely that 


dim V* = dim V. 


Let @ = {ai,...,an} be a basis for V. According to Theorem 1, there 
is (for each 7) a unique linear functional f; on V such that 

(8-11) filaj) = ôi. 

In this way we obtain from @ a set of n distinct linear functionals fı, . . . , fr 


on V. These functionals are also linearly independent. For, suppose 
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(3-12) j= 2 eifi- 
Then 


fa) = È ofila) 


l 


n 

È Cvs 
ial 
= Cj. 


In particular, if f is the zero functional, f(a;) = 0 for each j and hence 
the scalars c; are all 0. Now fi,...,f, are n linearly independent func- 
tionals, and since we know that V* has dimension n, it must be that 


@* = {fi,...,fn} is a basis for V*. This basis is called the dual basis 
of B. 


Theorem 15. Let V be a finite-dimenstonal vector space over the field F, 


and let ® = {ay,..., an} be a basis for V. Then there is a unique dual 
basis ®* = {fi,..., fn} for V* such that f;(a;) = 643. For each linear func- 
tional f on V we have 
(3-13) f= 5 flay 

i=l 


and for each vector a in V we have 
(3-14) a= 2 fi(a)ai. 
i= 


Proof. We have shown above that there is a unique basis which is 
‘dual’ to @. Iff is a linear functional on V, then f is some linear combination 
(3-12) of thef;, and as we observed after (3-12) the scalars c; must be given 
by c; = f(a;). Similarly, if 


n 
a= È Xa; 
imi 


is a vector in V, then 
n 
fila) = 2 uifj(as) 
i= 
n 
= È 246i; 
i=l 


so that the unique expression for @ as a linear combination of the a; is 
n 
a= Z Siloa. | 
i= 


Equation (3-14) provides us with a nice way of describing what the 
dual basis is. It says, if ® = {a1,..., æn} is an ordered basis for V and 
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@* = {fi,..-,fn} is the dual basis, then f: is precisely the function 
which assigns to each vector a in V the 7th coordinate of æ relative to the 
ordered basis ®. Thus we may also call the f; the coordinate functions for 
®. The formula (3-13), when combined with (3-14) tells us the following: 
If f is in V*, and we let f(a:) = ai, then when 


a= X10, + one + 2nQn 
we have 


(3-15) f(a) = hry + A + Anky. 
In other words, if we choose an ordered basis @ for V and describe each 
vector in V by its n-tuple of coordinates (2,...,%n) relative to ®, then 


every linear functional on V has the form (3-15). This is the natural 
generalization of Example 18, which is the special case V = F” and @ = 


{ea ..., en}. 


ExAMPLE 22, Let V be the vector space of all polynomial functions 
from R into R which have degree less than or equal to 2. Let t, tz and ts 
be any three distinct real numbers, and let 


Li(p) = p(t). 


Then Lı, Lə» and Ls are linear functionals on V. These functionals are 
linearly independent; for, suppose 


L = ali + cole + sb. 
If L = 0,i.e., if L(p) = 0 for each p in V, then applying L to the particular 


polynomial ‘functions’ 1, x, x?, we obtain 


Ci +c +cs = 0 
hic: + tece + bcs = 0 
tic, + tice + Be = 0 


From this it follows that cı = c2 = cs; = 0, because (as a short computation 
shows) the matrix 


111 
l t ts 
È È 8 


is invertible when tı, t2 and t are distinct. Now the L; are independent, 
and since V has dimension 3, these functionals form a basis for V*. What 
is the basis for V, of which this is the dual? Such a basis {p1, p2, pa} for V 
must satisfy 

Lips) = ôi 
or 

pilti) = 43. 


These polynomial functions are rather easily seen to be 
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_ (£ — te)(@ — t) 
PP) = Gah — &) 


(x — ti)(x — t) 
BO) GG sh) 
(x — h) (t — t), 
(is — h) (ts — ta) 


The basis {pı, pz ps} for V is interesting, because according to (3-14) we 
have for each p in V 


p(x) = 


p = plh)p + p(te)po + plts)ps. 


Thus, if cı, co, and c3 are any real numbers, there is exactly one polynomial 
function p over R which has degree at most 2 and satisfies p(t;) = cj, j = 
1, 2, 3. This polynomial function is p = cipi + coP2 + Caps. 

Now let us discuss the relationship between linear functionals and 
subspaces. If f is a non-zero linear functional, then the rank of f is 1 because 
the range of f is a non-zero subspace of the scalar field and must (therefore) 
be the scalar field. If the underlying space V is finite-dimensional, the rank 
plus nullity theorem (Theorem 2) tells us that the null space N; has 
dimension 

dim N; = dim V — 1. 


In a vector space of dimension n, a subspace of dimension n — 1 is called 
a hyperspace. Such spaces are sometimes called hyperplanes or subspaces 
of codimension 1. Is every hyperspace the null space of a linear functional? 
The answer is easily seen to be yes. It is not much more difficult to show 
that each d-dimensional subspace of an n-dimensional space is the inter- 
section of the null spaces of (n — d) linear functionals (Theorem 16 below). 


Definition. If V is a vector space over the field F and S is a subset of V, 
the annihilator of S is the set S° of linear functionals f on V such that 
f(a) = Ofor every a in S. 


It should be clear to the reader that S° is a subspace of V*, whether 
S is a subspace of V or not. If S is the set consisting of the zero vector 
alone, then S° = V*. If S = V, then S* is the zero subspace of V*. (This is 
easy to see when V is finite-dimensional.) 


Theorem 16. Let V be a finite-dimensional vector space over the field F, 
and let W be a subspace of V. Then 
dim W + dim W’ = dim V. 


Proof. Let k be the dimension of W and {ay,..., ax} a basis for 
W. Choose vectors az41,...,@,in V such that {a,..., an} is a basis for 
V. Let {fi,..-,fn} be the basis for V* which is dual to this basis for V. 
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The claim is that {fis,.-.,fn} is a basis for the annihilator W°. Certainly 
fi belongs to W° for i > k + 1, because 


Silai) = ði 
and 6;; = Oifi > k+ landj < k; from this it follows that, fori > k + 1, 
fila) = 0 whenever e is a linear combination of a1, . . . , ax. The functionals 


fut). . . , fa are independent, so all we must show is that they span W°. 
Suppose f is in V*. Now 


f= È sladf 


so that if fisin W’ we have f(a,) = 0 for i < k and 


f= EÈ fladfi. 
i=k+1 
We have shown that if dim W = k and dim V = n then dim W°’ = 
n=k. J 


Corollary. If W is a k-dimensional subspace of an n-dimensional vector 
space V, then W is the intersection of (n — k) hyperspaces in V. 


Proof. This is a corollary of the proof of Theorem 16 rather than 
its statement. In the notation of the proof, W is exactly the set of vectors a 
such that fia) = 0,7 =k+1,...,n. In case k = n — 1, W is the null 
space of fa J 


Corollary. If Wi and W: are subspaces of a finite-dimenstonal vector 
space, then W, = W; if and only if W! = W}. 


Proof. If W, = W, then of course W? = W}. If Wi = Wa then 
one of the two subspaces contains a vector which is not in the other. 
Suppose there is a vector æ which is in W; but not in Wi. By the previous 
corollaries (or the proof of Theorem 16) there is a linear functional f such 
that (8) = 0 for all 8 in W, but f(a) = 0. Then f is in W? but not in W2 
and Wi = W2 J 


In the next section we shall give different proofs for these two corol- 
laries. The first corollary says that, if we select some ordered basis for the 
space, each k-dimensional subspace can be described by specifying (n — k) 
homogeneous linear conditions on the coordinates relative to that basis. 

Let us look briefly at systems of homogeneous linear equations from 
the point of view of linear functionals. Suppose we have a system of linear 
equations, 

Ants +--+ + Anta = 0 


Amt + Gia + Annnitn = 0 
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for which we wish to find the solutions. If we let fa 7 = 1,..., m, be the 
linear functional on F” defined by 


Silt, . í AE n) = Aat + a + A indn 
then we are seeking the subspace of F” of all a such that 
fila) = 0, i=1,...,m. 


In other words, we are seeking the subspace annihilated by fi,..., fm. 
Row-reduction of the coefficient matrix provides us with a systematic 
method of finding this subspace. The n-tuple (Ain ..., Aim) gives the 
coordinates of the linear functional f; relative to the basis which is dual 
to the standard basis for F”. The row space of the coefficient matrix may 
thus be regarded as the space of linear functionals spanned by fi ... , fm. 
The solution space is the subspace annihilated by this space of functionals. 

Now one may look at the system of equations from the ‘dual’ point 
of view. That is, suppose that we are given m vectors in F” 


Qi = (Ais le .; Ain) 
and we wish to find the annihilator of the subspace spanned by these 
vectors. Since a typical linear functional on F” has the form 
S(®1y -p Bn) = C101 + +++ + Cade 
the condition that f be in this annihilator is that 


n 
Z Auc; = 0, a=1,...,m 
j= 


that is, that (cı, . . . , Cn) be a solution of the system AX = 0. From this 
point of view, row-reduction gives us a systematic method of finding the 
annihilator of the subspace spanned by a given finite set of vectors in F”. 


ExamPLE 23. Here are three linear functionals on R4: 


Jilt, Le, £3, 4) = xı + 222 + Day + ay 
fo(X1, Lo, Lo, L4) = 22y + T4 
Jalti, Le, Lg, Ls) = — 2x; — 403 + 34. 


The subspace which they annihilate may be found explicitly by finding the 
row-reduced echelon form of the matrix 


1 2 2 1 
A = 0 2 0 1} 
—2 0 —4 3 


A short calculation, or a peek at Example 21 of Chapter 2, shows that 


102 0 
k=ļ|0 1 0 OF 
0001 
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Therefore, the linear functionals 
9i(X1, Ta, Ta, T4) = Tı + xg 
gelT, T2, Ta, T4) = Le 
ga(Tı, To, Ta, T4) = T4 
span the same subspace of (R+)* and annihilate the same subspace of R* 
as do fi, fo, fs. The subspace annihilated consists of the vectors with 
Ly = — 2r; 
t = q4 = 0. 


EXAMPLE 24. Let W be the subspace of R® which is spanned by the 
vectors 


a= (2, —2, 3, 4, —1), Qœ = (0, 0, =i; —2, 3) 
a, = (—1, 1, 2, 5, 2), as = (1, —1, 2, 3, 0). 
How does one describe W°, the annihilator of W? Let us form the 4 X 5 


matrix A with row vectors a, œz, a3, a4, and find the row-reduced echelon 
matrix R which is row-equivalent to A: 


2 —2 3 4 -1 1 -1 0 -1 0 
~|=1 1 2 5 2 _ 10 0 1 2 0 
He SG). “Greely alo. 9 R=lo o0 oif 
1 —1 2 3 0 0 0 0 0 0 
If f is a linear functional on R5: 
5 
Fiy ert) = D Cjfj 
= 


then f is in W° if and only if f(a;) = 0,7 = 1, 2, 3, 4, i.e., if and only if 
5 
> Ai;c; = 0, 1<i<4. 
j=l 
This is equivalent to 


5 
È Rog=0, 1<i<3 
z 


or 
C1 — C2 —c, = 0 
cs + 2c, = 0 
c = 0. 


We obtain all such linear functionals f by assigning arbitrary values to 
ce and c4 Say co = a and c4 = b, and then finding the corresponding cı = 
a + b, c = —2b, cs = 0. So W° consists of all linear functionals f of the 
form 


S (a1, We, ©, La T5) = (a + bti + AX, — 2br; + das. 
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The dimension of W® is 2 and a basis {fı, fo} for W’ can be found by first 
taking a = 1, b = 0 and then a = 0,b = 1: 

filan., £5) = 41 + T 

falta... , £8) = 21 — 2xg + 24. 
The above general f in W? is f = afi + bfe. 


Exercises 
1. In R3, let ai = (1,0, 1), œ: = (0, 1, —2), a3 = (—1, —1, 0). 
(a) If fis a linear functional on R? such that 
f(m) =1, f(a) =—-1, fla) = 3, 
and if a = (a, b, c), find f(a). 
(b) Describe explicitly a linear functional f on R? such that 
f(a) = f(a) =0 but f(a) # 0. 
(c) Let f be any linear functional such that 
fla) = f(a) = 0 and flas) ¥ 0. 
If a = (2, 3, —1), show that f(a) # 0. 
2. Let ® = {a1, az, a3} be the basis for C° defined by 
a =(1,0,-1), a@=(1,1,1), as = (2, 2, 0). 
Find the dual basis of @. 


3. If A and B are n X n matrices over the field F, show that trace (AB) = trace 
(BA). Now show that similar matrices have the same trace. 


4. Let V be the vector space of all polynomial functions p from R into R which 

have degree 2 or less: 
p(x) = ce + ce + cox? 
Define three linear functionals on V by 
1 2 -1 
ft) = f pede Hp) = f pE) de, fl) = S pE) ae. 

Show that {fi, fo, Ja} is a basis for V* by exhibiting the basis for V of which it is 
the dual. 

5. If A and B are n X n complex matrices, show that AB — BA = I is im- 
possible. 


6. Let m and n be positive integers and F a field. Let fi, ..., fm be linear func- 
tionals on F”. For æ in F” define 


Ta = (fila), ..., fala)). 
Show that T is a linear transformation from F” into F”. Then show that every 
linear transformation from F” into F™ is of the above form, for some fi, ..- 5 fm. 
7. Let a = (1,0, —1, 2) and a, = (2, 3,1, 1), and let W be the subspace of R* 
spanned by a, and a. Which linear functionals f: 
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f(a, Ta, T3, 4) = 0t + Coke + C323 + Clg 
are in the annihilator of W? 
8. Let W be the subspace of Rë which is spanned by the vectors 


a = & + 2e + éz, Qe. = €, + 363 + 364 + 65 
a3 = e + 4e: + Beg + 4e, + ep. 


Find a basis for W°. 


9. Let V be the vector space of all 2 X 2 matrices over the field of real numbers, 


and let 
J 2 —2 
B= & | 


Let W be the subspace of V consisting of all A such that AB = 0. Let f be a linear 
functional on V which is in the annihilator of W. Suppose that f(T) = 0 and 
f(C) = 3, where I is the 2 X 2 identity matrix and 


calra 
Find f(B). 


10. Let F be a subfield of the complex numbers. We define n linear functionals 
on F” (n > 2) by 


n 
filty + .5t%) = Bk Dey, 1SkSn 
j= 


What is the dimension of the subspace annihilated by fi, ..., fn? 


ll. Let Wı and We be subspaces of a finite-dimensional vector space V. 
(a) Prove that (Wi + Wa)? = W3 We. 
(b) Prove that (Wi Ñ W)? = W} + W3. 


12. Let V be a finite-dimensional vector space over the field F and let W be a 
subspace of V. If f is a linear functional on W, prove that there is a linear functional 
g on V such that g(a) = f(a) for each «æ in the subspace W. 


13. Let F be a subfield of the field of complex numbers and let V be any vector 
space over F. Suppose that f and g are linear functionals on V such that the func- 
tion A defined by h(a) = f(æ)g(æ) is also a linear functional on V. Prove that 
either f = 0 org = 0. 


14. Let F be a field of characteristic zero and let V be a finite-dimensional vector 


space over F. If a1,..., Q@m are finitely many vectors in V, each different from the 
zero vector, prove that there is a linear functional f on V such that 
f(a) £ 0, P= 1,...,m. 


15. According to Exercise 3, similar matrices have the same trace. Thus we can 
define the trace of a linear operator on a finite-dimensional space to be the trace 
of any matrix which represents the operator in an ordered basis. This is well- 
defined since all such representing matrices for one operator are similar. 

Now let V be the space of all 2 X 2 matrices over the field F and let P be a 
fixed 2 X 2 matrix. Let T be the linear operator on V defined by T(A) = PA. 
Prove that trace (T) = 2 trace (P). 
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16. Show that the trace functional on n X n matrices is unique in the following 
sense. If W is the space of n X n matrices over the field F and if f is a linear func- 
tional on W such that f(AB) = f(BA) for each A and B in W, then f is a scalar 
multiple of the trace function. If, in addition, f(7) = n, then f is the trace function. 


17. Let W be the space of n X n matrices over the field F, and let Wo be the sub- 
space spanned by the matrices C of the form C = AB — BA. Prove that Wo is 
exactly the subspace of matrices which have trace zero. (Hint: What is the dimen- 
sion of the space of matrices of trace zero? Use the matrix ‘units,’ i.e., matrices with 


exactly one non-zero entry, to construct enough linearly independent matrices of 
the form AB — BA.) 
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One question about dual bases which we did not answer in the last 
section was whether every basis for V* is the dual of some basis for V. One 
way to answer that question is to consider V**, the dual space of V*. 

If a is a vector in V, then a induces a linear functional Le on V* 


defined b 
i RA SIO, Fan Ve 


The fact that L, is linear is just a reformulation of the definition of linear 
operations in V*: 


L.(cf + 9) = (ef + g)(a) 

= (¢f)(a) + g(a) 
ofla) + g(a) 
cLa(f) + La(g). 


If V is finite-dimensional and a ~ 0, then La ¥ 0; in other words, there 
exists a linear functional f such that f(a) = 0. The proof is very simple 
and was given in Section 3.5: Choose an ordered basis ® = {an . . ., @n} 
for V such that a = «æ and let f be the linear functional which assigns to 
each vector in V its first coordinate in the ordered basis @. 


Theorem 17. Let V be a finite-dimensional vector space over the field F. 
For each vector ain V define 


L.(f) = f(a), f in V*. 
The mapping a > La is then an isomorphism of V onto V**. 


Proof. We showed that for each a the function Le is linear. 
Suppose a and @ are in V and cis in F, and let y = ca + 8. Then for each f 


in V* 
L,(f) = fy) 
= f(ca + B) 
= cf(a) + f(6) 
= cLa(f) + Lal(f) 
and so 


Ly = cL, -+ Lg. 
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This shows that the mapping a > La is a linear transformation from V 
into V**. This transformation is non-singular; for, according to the 
remarks above La = 0 if and only if a = 0. Now a > La is a non-singular 
linear transformation from V into V**, and since 


dim V** = dim V* = dim V 


Theorem 9 tells us that this transformation is invertible, and is therefore 
an isomorphism of V onto V**. ff 


Corollary. Let V be a finite-dimensional vector space over the field F. 
If L is a linear functional on the dual space V* of V, then there is a unique 
vector a in V such that 


L(f) = f(a) 


for every f in V*, 


Corollary. Let V be a finite-dimensional vector space over the field F. 
Each basis for V* is the dual of some basis for V. 


Proof. Let &* = {f,,..-., fa} be a basis for V*. By Theorem 15, 
there is a basis {L4, . . . , La} for V** such that 


L (fs) = 643. 
Using the corollary above, for each 7 there is a vector a; in V such that 
Li(f) = f(a) 


for every f in V*, i.e., such that L: = La» It follows immediately that 
{ay,..., Qn} is a basis for V and that ®* is the dual of this basis. J 


In view of Theorem 17, we usually identify a with La and say that V 
‘is’ the dual space of V* or that the spaces V, V* are naturally in duality 
with one another. Each is the dual space of the other. In the last corollary 
we have an illustration of how that can be useful. Here is a further illustra- 
tion. 

If E is a subset of V*, then the annihilator E° is (technically) a subset 
of V**. If we choose to identify V and V** as in Theorem 17, then E’ is a 
subspace of V, namely, the set of all a in V such that f(a) = 0 for all f in E. 
In acorollary of Theorem 16 we noted that each subspace W is determined 
by its annihilator W°. How is it determined? The answer is that W is the 
subspace annihilated by all f in W°, that is, the intersection of the null 
spaces of all f’s in W°. In our present notation for annihilators, the answer 
may be phrased very simply: W = (W°)°, 


Theorem 18. If S is any subset of a finite-dimensional vector space V, 
then (S°)? is the subspace spanned by S. 
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Proof. Let W be the subspace spanned by S. Clearly W° = 8°, 
Therefore, what we are to prove is that W = W%, We have given one 
proof. Here is another. By Theorem 16 


dim W + dim W° = dim V 
dim W° + dim W% = dim V* 


and since dim V = dim V* we have 
dim W = dim W”, 
Since W is a subspace of W°, we see that W = Wf 


The results of this section hold for arbitrary vector spaces; however, 
the proofs require the use of the so-called Axiom of Choice. We want to 
avoid becoming embroiled in a lengthy discussion of that axiom, so we shall 
not tackle annihilators for general vector spaces. But, there are two results 
about linear functionals on arbitrary vector spaces which are so fundamen- 
tal that we should include them. 

Let V be a vector space. We want to define hyperspaces in V. Unless 
V is finite-dimensional, we cannot do that with the dimension of the 
hyperspace. But, we can express the idea that a space N falls just one 
dimension short of filling out V, in the following way: 


1. N is a proper subspace of V; 
2. if W is a subspace of V which contains N, then either W = N or 
W =V. 


Conditions (1) and (2) together say that N is a proper subspace and there 
is no larger proper subspace, in short, N is a maximal proper subspace. 


Definition. If V is a vector space, a hyperspace in V is a maximal 
proper subspace of V. 


Theorem 19. Iff is a non-zero linear functional on the vector space V, 
then the null space of f is a hyperspace in V. Conversely, every hyperspace in V 
is the null space of a (not unique) non-zero linear functional on V. 


Proof. Let f be a non-zero linear functional on V and N;; its null 
space. Let @ be a vector in V which is not in N;, i.e., a vector such that 
f(a) = 0. We shall show that every vectorin V is in the subspace spanned 
by N; and a. That subspace consists of all vectors 


y + ca, yin Ny, cin F. 


Let 8 be in V. Define 


_ 118) 
f(a) 


c 
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which makes sense because f(a) # 0. Then the vector y = 8 — ce is in Ny 
since 
f(y) = f(8 — ca) 
= ae — f(a) 


So £ is in the subspace spanned by Ny and a. 

Now let N be a hyperspace in V. Fix some vector @ which is not in N. 
Since N is a maximal proper subspace, the subspace spanned by N and a 
is the entire space V. Therefore each vector 8 in V has the form 


B = Y + ca, yin N, cin F. 
The vector y and the scalar c are uniquely determined by £. If we have also 
B=yYy +da, y in N, c’ in F. 
then 
(e — cla = y = y’. 
If c’ — c # 0, then a would be in N; hence, c’ = c and y’ = y. Another 
way to phrase our conclusion is this: If 8 isin V, there is a unique scalar c 


such that 8 — ca is in N. Call that scalar g(8). It is easy to see that g is a 
linear functional on V and that N is the null space of g. JJ 


Lemma. If f and g are linear functionals on a vector space V, then g 
is a scalar multiple of f if and only if the null space of g contains the null space 
of f, that is, if and only if f(a) = 0 implies g(a) = 0. 


Proof. If f = 0 then g = 0 as well and g is trivially a scalar 
multiple of f. Suppose f + 0 so that the null space N; is a hyperspace in V. 
Choose some vector a in V with f(a) # 0 and let 


= ga), 

f(a) 
The linear functional h = g — cf is 0 on Ny, since both f and g are 0 there, 
and h(a) = g(a) — cf(@) = 0. Thus h is 0 on the subspace spanned by N; 
and a—and that subspace is V. We conclude that h = 0, i.e., that g = 
cf. l 


c 


Theorem 20. Let g, fı, . . . , f, be linear functionals on a vector space V 
with respective null spaces N, Ni, . . ., Nr. Then g is a linear combination of 
f,,...,f, of and only if N contains the intersection Ni N --- O Ny. 


Proof. If g = afi+--- + of, and f(a) = 0 for each i, then 
clearly g(a) = 0. Therefore, N contains MA = Q N». 

We shall prove the converse (the ‘if’ half of the theorem) by induction 
on the number r. The preceding lemma handles the case r = 1. Suppose we 
know the result for r = k — 1, and let f,,..., fe be linear functionals with 
null spaces N;,..., Ne such that Ni O -> Q Na is contained in N, the 
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null space of g. Let g’, fi, . . . , fr-ı be the restrictions of g, fi,..., fis to 
the subspace N+. Then g’, f},...,f;-1 are linear functionals on the vector 
space N, Furthermore, if a is a vector in Nx and fila) =0,7=1,..., 
k — 1, then a is in M, O -:: O Nx and so g’(a) = 0. By the induction 
hypothesis (the case r = k — 1), there are scalars c; such that 

g = afit oo + ceafi-s. 
Now let 


k-1 
(3-16) h=g— È af; 
i=l 


Then h is a linear functional on V and (3-16) tells us that h(a) = 0 for 
every a in N+. By the preceding lemma, h is a scalar multiple of f. If h = 
Crk, then 


k 
g= 2 cli. l 


Exercises 


l. Let n be a positive integer and F a field. Let W be the set of all vectors 
(%,...,%n) in F” such that zı + --- +2, = 0. 
(a) Prove that W® consists of all linear functionals f of the form 


n 
f(x, oe - Zn) =C 2 Tj. 
j= 


(b) Show that the dual space W* of W can be ‘naturally’ identified with the 
linear functionals 
Styn.. yty) = C01 + woo + Cnn 
on F” which satisfy ¢ + <+- + cn = 0. 


2. Use Theorem 20 to prove the following. If W is a subspace of a finite-dimen- 
sional vector space V and if {m,...,9,} is any basis for W®, then 
rT 
W = N Nov 
i=] 


3. Let S be a set, F a field, and V (S; F) the space of all functions from S into F: 
(f+ 9)(z) = f(z) + g(x) 
(cf)(«) = g(x). 
Let W be any n-dimensional subspace of V(S; F). Show that there exist points 
2, ...+,2n in S and functions fi, ..., fa in W such that fi(z;) = ôi; 
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Transformation 


Suppose that we have two vector spaces over the field F, V, and W, 
and a linear transformation T from V into W. Then T induces a linear 
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transformation from W* into V*, as follows. Suppose g is a linear functional 
on W, and let 


(3-17) f(a) = g(Ta) 


for each a in V. Then (3-17) defines a function f from V into F, namely, 
the composition of T, a function from V into W, with g, a function from 
W into F. Since both T and g are linear, Theorem 6 tells us that f is also 
linear, i.e., f is a linear functional on V. Thus T provides us with a rule T! 
which associates with each linear functional g on W a linear functional 
f = T'g on V, defined by (3-17). Note also that J" is actually a linear 
transformation from W* into V*; for, if gı and gz are in W* and c is a scalar 


[T*(cg: + G2) (a) = (cgi + G2)( Te) 
cgi(T'x) + g:(Ta) 
c(T'g1) (a) + (T'g2)(a) 
so that T*(cg: + g2) = cT'gı + T'go. Let us summarize. 


Theorem 21. Let V and W be vector spaces over the field F. For each 
linear transformation T from V into W, there is a unique linear transformation 
Tt from W* into V* such that 


(T'g)(@) = g(Ta) 
for every g in W* anda in V. 


We shall call T! the transpose of T. This transformation 7" is often 
called the adjoint of 7; however, we shall not use this terminology. 


Theorem 22. Let V and W be vector spaces over the field F, and let T 
be a linear transformation from V into W. The null space of Tt is the annthi- 
lator of the range of T. If V and W are finite-dimenstonal, then 

(i) rank (T*) = rank (T) 
(ii) the range of Tt ts the annthilator of the null space of T. 


Proof. If g is in W*, then by definition 
(T'g)(a) = g(Ta) 


for each a in V. The statement that g is in the null space of T! means that 
g(Ta) = 0 for every a in V. Thus the null space of T’ is precisely the 
annihilator of the range of T', 

Suppose that V and W are finite-dimensional, say dim V = n and 
dim W = m. For (i): Let r be the rank of T, i.e., the dimension of the range 
of T. By Theorem 16, the annihilator of the range of T then has dimension 
(m — r). By the first statement of this theorem, the nullity of T! must be 
(m — r). But then since T“ is a linear transformation on an m-dimensional 
space, the rank of Tis m — (m — r) = r, and so T and Tt have the same 
rank. For (ii): Let N be the null space of T. Every functional in the range 
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of T’ is in the annihilator of N; for, suppose f = Ttg for some g in W*; then, 
if æa is in N 
fa) = (Tg) (a) = g(Ta) = g(0) = 0. 
Now the range of Tt is a subspace of the space N°, and 
dim N° = n — dim N = rank (T) = rank (T") 


so that the range of T! must be exactly N°. I 


Theorem 23. Let V and W be finite-dimensional vector spaces over the 
field F. Let @ be an ordered basis for V with dual basis ®*, and let G’ be an 
ordered basis for W with dual basis @'*. Let T be a linear transformation 
from V into W; let A be the matrix of T relative to B, @' and let B be the matrix 
of Tt relative to @'™*, ®*. Then Bij = Aii. 

Proof. Let 
G = {a1,..., an}, B= {bu tes, Babs 
B* = {fi - -3 Sab, B’* = {H,...5 9m}. 


By definition, 
m 
Ta; = 2 Auba j 


Il 
= 
$ 
á 
3 


n 
T'g; 2 Bisfi, jo=l,...,m. 


On the other hand, 
(T'g;) (ai) = g;(T'as) 


= gj ( 5 Aut) 
k=l 
= 5 Argia) 
k=1 


= E AÁrôjk 
k=l 
= Ay. 
For any linear functional f on V 
f= 2 Flai)fi. 


If we apply this formula to the functional f = T'g; and use the fact that 
(T'g;)(ai) = Aj; we have 


Ty = 2 A xf 


from which it immediately follows that Bi; = Aj. I 
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Definition. If A isan m X n matriz over the field F, the transpose of 
A is then X m matrix At defined by Af; = Aji. 


Theorem 23 thus states that if T is a linear transformation from V 
into W, the matrix of which in some pair of bases is A, then the transpose 
transformation 7" is represented in the dual pair of bases by the transpose 
matrix A‘. 


Theorem 24. Let A be any m X n matrix over the field F. Then the 
row rank of A ts equal to the column rank of A. 


Proof. Let @ be the standard ordered basis for F” and @’ the 
standard ordered basis for F”. Let T be the linear transformation from F” 
into F” such that the matrix of T relative to the pair ®, @’ is A, i.e., 


T (a, . a+} ia) = (Ys - : Ym) 
where 
n 
Yi = E Asst. 

j=1 
The column rank of A is the rank of the transformation T, because the 
range of T consists of all m-tuples which are linear combinations of the 
column vectors of A. 

Relative to the dual bases @’* and @&*, the transpose mapping 7" is 
represented by the matrix A‘. Since the columns of A‘ are the rows of A, 
we see by the same reasoning that the row rank of A (the column rank of A‘) 
is equal to the rank of T. By Theorem 22, T and J‘ have the same rank, 
and hence the row rank of A is equal to the column rank of A. ff 


Now we see that if A is an m X n matrix over F and T is the linear 
transformation from F” into F” defined above, then 
rank (T) = row rank (A) = column rank (A) 
and we shall call this number simply the rank of A. 


EXAMPLE 25. This example will be of a general nature—more dis- 
cussion than example. Let V be an n-dimensional vector space over the 
field F, and let T be a linear operator on V. Suppose @ = {an,..., an} 
is an ordered basis for V. The matrix of T in the ordered basis @ is defined 
to be the n X n matrix A such that 


n 
Ta; = X Aijai 
j=l 


in other words, A,; is the ith coordinate of the vector Ta; in the ordered 
basis @. If {fu . . . , fa} is the dual basis of ®, this can be stated simply 


Ai = fi(Ta5). 
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Let us see what happens when we change basis. Suppose 
@’ = {al,..., ant 
is another ordered basis for V, with dual basis {fi,..., fi}. If B is the 
matrix of T in the ordered basis ®’, then 
Bi; = fiT ai). 
Let U be the invertible linear operator such that Ua; = aj. Then the 


transpose of U is given by Utfi = fı It is easy to verify that since U is 
invertible, so is Ut and (U')-! = (U-')4. Thus fi = (UH fa i = 1,...,n. 


Therefore, 
[(U~) Fi] (Ta) 

= f({U“T ai) 

= f(U-TUa;). 
Now what does this say? Well, f:(U-!7'Ua;) is the îi, j entry of the matrix 
of U-!TU in the ordered basis ®. Our computation above shows that this 
scalar is also the 7, 7 entry of the matrix of T in the ordered basis @’. In 
other words 


[Tle = [U'TU]e 
= [(U"Je[T]e[U]e 
= [U]e'[T]e[U]e 


and this is precisely the change-of-basis formula which we derived earlier. 


Exercises 


1. Let F be a field and let f be the linear functional on F? defined by f(21, 22) = 
ax, + baz For each of the following linear operators T, let g = Tif, and find 
g(t z2). 

(a) T(z; x2) = (23, 0); 
(b) T(x, z2) = (—22, 21); 
(c) T(x, T) = (t1 — Tz, T1 + T2). 

2. Let V be the vector space of all polynomial functions over the field of real 
numbers. Let a and b be fixed real numbers and let f be the linear functional on V 
defined by 


b 
fp) = j p(x) dz. 
If D is the differentiation operator on V, what is Dtf? 


3. Let V be the space of all n X n matrices over a field F and let B be a fixed 
n Xn matrix. If T is the linear operator on V defined by T(A) = AB — BA, 
and if f is the trace function, what is Tf? 


4, Let V be a finite-dimensional vector space over the field F and let T be a 
linear operator on V. Let c be a scalar and suppose there is a non-zero vector a 
in V such that Ta = ca. Prove that there is a non-zero linear functional f on V 
such that Tf = ef. 
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5. Let A be an m X n matrix with real entries. Prove that A = 0 if and only 
if trace (AtA) = 0. 

6. Let n be a positive integer and let V be the space of all polynomial functions 
over the field of real numbers which have degree at most n, i.e., functions of the 
form 

S(t) = o+ ast +++ + eer, 
Let D be the differentiation operator on V. Find a basis for the null space of the 
transpose operator D*. 


7. Let V be a finite-dimensional vector space over the field F. Show that T — T! 
is an isomorphism of L(V, V) onto L(V*, V*). 
8. Let V be the vector space of n X n matrices over the field F. 
(a) If B is a fixed n X n matrix, define a function fs on V by fs(A) = trace 
(B'A). Show that fs is a linear functional on V. 
(b) Show that every linear functional on V is of the above form, i.e., is fe 
for some B. 
(c) Show that B — fs is an isomorphism of V onto V*, 


4. Polynomials 


4.1. Algebras 


The purpose of this chapter is to establish a few of the basic prop- 
erties of the algebra of polynomials over a field. The discussion will be 
facilitated if we first introduce the concept of a linear algebra over a field. 


Definition. Let F be a field. A linear algebra over the field F is a 
vector space @ over F with an additional operation called multiplication of 
vectors which associates with each pair of vectors a, B in Q a vector ag in 
Q called the product of a and 6 in such a way that 


(a) multiplication is associative, 
a(6y) = (aß)y 
(b) multiplication is distributive with respect to addition, 
aB + y) = oB + ay and (a+ B)y = ay + By 
(c) for each scalar c in F, 
c(aB) = (ca)8 = a(cé). 
If there is an element 1 in Q such that la = al = a for each a in Q, 


we call Q a linear algebra with identity over F, and call 1 the identity 
of Q. The algebra Q ts called commutative if aß = Ga for all a and B in Q. 


Examrue 1. The set of n X n matrices over a field, with the usual 
operations, is a linear algebra with identity; in particular the field itself 
is an algebra with identity. This algebra is not commutative if n > 2. 
The field itself is (of course) commutative. 
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ExamrLe 2. The space of all linear operators on a vector space, with 
composition as the product, is a linear algebra with identity. It is com- 
mutative if and only if the space is one-dimensionai. 


The reader may have had some experience with the dot product and 
cross product of vectors in R*. If so, he should observe that neither of 
these products is of the type described in the definition of a linear algebra. 
The dot product is a ‘scalar product,’ that is, it associates with a pair of 
vectors a scalar, and thus it is certainly not the type of product we are 
presently discussing. The cross product does associate a vector with each 
pair of vectors in R*; however, this is not an associative multiplication. 

The rest of this section will be devoted to the construction of an 
algebra which is significantly different from the algebras in either of the 
preceding examples. Let F be a field and S the set of non-negative in- 
tegers. By Example 3 of Chapter 2, the set of all functions from S into 
F is a vector space over F. We shall denote this vector space by F”. The 
vectors in F” are therefore infinite sequences f = (fo, fi, fo». . .) of scalars 
fiin F. If g = (Go, 91, 92, .-.), gi in F, and a, b are scalars in F, af + bg is 
the infinite sequence given by 


(4-1) af + bg = (afo + bgo, afi + bgi, afz + bg», . . .). 
We define a product in F” by associating with each pair of vectors f and 
g in F” the vector fg which is given by 


n 
(4-2) (Fg)a = PRLE n=0,1,2,.... 
Thus 

Ja = (Sogo, fog + figo, Foge + figi + fago - - -) 
and as 
(Of)n = E gifri = E figami = F9)n 

i=0 i=0 
for n = 0,1,2,..., it follows that multiplication is commutative, fg = gf. 
If h also belongs to F”, then 


[Yoh]. = È (fo) 


2 (3, fies) 


> > fg i—jln—i 


t=0 j=0 


n nj 
D Si 2 giln—t—j 
j=0 t=0 


Zaha = Loh) 
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for n = 0,1, 2,..., so that 


(4-3) (fg)h = f(gh). 

We leave it to the reader to verify that the multiplication defined by (4-2) 
satisfies (b) and (c) in the definition of a linear algebra, and that the 
vector 1 = (1,0,0,...) serves as an identity for F”. Then F”, with the 
operations defined above, is a commutative linear algebra with identity 
over the field F. 

The vector (0, 1,0,...,0,...) plays a distinguished role in what 
follows and we shall consistently denote it by x. Throughout this chapter 
x will never be used to denote an element of the field F. The product of x 
with itself n times will be denoted by x” and we shall put 2° = 1. Then 


x? = (0,0, 1,0,...), x? = (0, 0, 0, 1, 0, . +.) 


and in general for each integer k > 0, (x) = 1 and (z*), = 0 for all non- 
negative integers n ~ k. In concluding this section we observe that the 
set consisting of 1, x, z2,... is both independent and infinite. Thus the 
algebra F” is not finite-dimensional. 

The algebra F” is sometimes called the algebra of formal power 
series over F. The element f = (fo, fi, fo, . . .) is frequently written 


(4-4) f= 2 fait”. 


This notation is very convenient for dealing with the algebraic operations. 
When used, it must be remembered that it is purely formal. There are no 
‘infinite sums’ in algebra, and the power series notation (4-4) is not in- 
tended to suggest anything about convergence, if the reader knows what 
that is. By using sequences, we were able to define carefully an algebra 
in which the operations behave like addition and multiplication of formal 
power series, without running the risk of confusion over such things as 
infinite sums. 
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We are now in a position to define a polynomial over the field F. 


Definition. Let F[x] be the subspace of F* spanned by the vectors 
1, x, x?,.... An element of F [x] is called a polynomial over F. 


Since F[x] consists of all (finite) linear combinations of x and its 
powers, a non-zero vector f in F® is a polynomial if and only if there is 
an integer n > 0 such that fa ~ 0 and such that fe = 0 for all integers 
k > n; this integer (when it exists) is obviously unique and is called the 
degree of f. We denote the degree of a polynomial f by deg f, and do 
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not assign a degree to the 0-polynomial. If f is a non-zero polynomial of 
degree n it follows that 


(4-5) f= fæ + fiz +H fer t o + fax, fn ¥0. 


The scalars fo, fi, ..-, fn are sometimes called the coefficients of f, and 
we may say that f is a polynomial with coefficients in F. We shall call 
polynomials of the form cx? scalar polynomials, and frequently write c 
for cx”. A non-zero polynomial f of degree n such that fa = 1 is said to 
be a monic polynomial. 

The reader should note that polynomials are not the same sort of 
objects as the polynomial functions on F which we have discussed on 
several occasions. If F contains an infinite number of elements, there is a 
natural isomorphism between F[z] and the algebra of polynomial func- 
tions on F. We shall discuss that in the next section. Let us verify that 
F[z] is an algebra. 


Theorem 1. Let f and g be non-zero polynomials over F. Then 


(i) fg is a non-zero polynomial; 
(ii) deg (fg) = deg f + deg g; 
(iii) fg is a monic polynomial if both f and g are monic polynomials; 
(iv) fg is a scalar polynomial if and only if both f and g are scalar 
polynomials; 


(v) ff +g #0, 
deg (f + g) < maz (deg f, deg g). 


Proof. Suppose f has degree m and that g has degree n. If k is a 
non-negative integer, 
m+n+k 
(f9)mtntk = 2, SiGmtntk—i- 
In order that fiQminse-i #0, it is necessary that i < m and m + n + 
k —i<n. Hence it is necessary that m + k < i < m, which implies 
k = 0 andi = m. Thus 


(4-6) (F9) min = Smpn 
and 
(4-7) (f9)m+nrz = 0, k> 0. 


The statements (i), (ii), (iii) follow immediately from (4-6) and (4-7), 
while (iv) is a consequence of (i) and (ii). We leave the verification of (v) 
to the reader. J 


Corollary 1. The set of all polynomials over a given field F equipped 
with the operations (4-1) and (4-8) is a commutative linear algebra with 
identity over F. 
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Proof. Since the operations (4-1) and (4-2) are those defined in 
the algebra F” and since F[zx] is a subspace of F”, it suffices to prove that 
the product of two polynomials is again a polynomial. This is trivial when 
one of the factors is 0 and otherwise follows from (i). J 


Corollary 2. Suppose f, g, and h are polynomials over the field F such 
that f = 0 and fg = fh. Then g = h. 
Proof. Since fg = fh, fg — h) = 0, and as f # 0 it follows at 
once from (i) thatg —-h=0. J 


Certain additional facts follow rather easily from the proof of Theorem 
1, and we shall mention some of these. 
Suppose 


f= È fæ? and g= È gjæi. 
i=0 j=0 
Then from (4-7) we obtain, 


(4-8) E (z fede 


s=0 \r= 


The reader should verify, in the special case f = cx”, g = dx” with c, d in 
F, that (4-8) reduces to 


(4-9) (ca) (dx™) = cda™+, 


Now from (4-9) and the distributive laws in F[z], it follows that the 
product in (4-8) is also given by 


(4-10) Z fgati 
uJ 


where the sum is extended over all integer pairs 7, 7 such that0 < i < m, 
and0 Lj <n. 


Definition. Let Q be a linear algebra with identity over the field F. We 

shall denote the identity of @ by 1 and make the convention that o® = 1 for 
n 

each ain Q. Then to each polynomial f = È fixi over F and a in Q we asso- 
i=0 


ciate an element f(a) in Q by the rule 
fla) = È fioi. 


EXAMPLE 3. Let C be the field of complex numbers and let f = x? + 2. 
(a) If @ = C and z belongs to C, f(z) = 2? + 2, in particular f(2) = 6 


s(t) = 


and 
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(b) If @ is the algebra of all 2 X 2 matrices over C and if 
1 0 
B=[_i a] 


no =2lo i}+[-1 al -[- 3} 


(c) If @ is the algebra of all linear operators on C? and T is the ele- 
ment of @ given by 


then 


Tc, cz, C3) = v2 Ci, C2, TV 2 c3) 
then f(T) is the linear operator on C? defined by 
S(T) (ci; c2, c3) = (0, 3c, 0). 


(d) If @ is the algebra of all polynomials over C and g = zt + 3i, 
then f(g) is the polynomial in @ given by 


fg) = —7 + bizt + zè. 
The observant reader may notice in connection with this last example 


that if f is a polynomial over any field and z is the polynomial (0, 1, 0, . . .) 
then f = f(z), but he is advised to forget this fact. 


Theorem 2. Let F be a field and Q be a linear algebra with identity 
over F. Suppose f and g are polynomials over F, that a is an element of Q, 
and that c belongs to F. Then 


(i) (ef + g)(a) = cf(a) + g(a); 
(ii) (fg)(e) = fla)g(a). 


Proof. As (i) is quite easy to establish, we shall only prove (ii). 
Suppose 


By (4-10), 
Jg = Dhgix*? 
and hence by (i), "7 
(fg)(a) = 2 figiatt? 


= (3 fer)( 3,00’) 
= f(a)g(a). fj 
Exercises 


1. Let F be a subfield of the complex numbers and let A be the following 2 x 2 
matrix over F 
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For each of the following polynomials f over F, compute f(A). 
(a) f= z? — z + 2; 
(b) f = 2% — l; 
(ce) f =x? — 5r +7. 


2. Let T be the linear operator on R? defined by 
T (a1, T2, £3) = (£i, 12, — 222 — 13). 
Let f be the polynomial over R defined by f = —z? + 2. Find f(T). 


3. Let A be an n X n diagonal matrix over the field F, i.e., a matrix satisfying 
Ai; = 0 for i # j. Let f be the polynomial over F defined by 


f = (x mm An) aca (x = Ann). 
What is the matrix f(A)? 


4. If f and g are independent polynomials over a field F and A is a non-zero 
polynomial over F, show that fh and gh are independent. 


5. If F is a field, show that the product of two non-zero elements of F” is non-zero. 


6. Let S be a set of non-zero polynomials over a field F., If no two elements of S 
have the same degree, show that S is an independent set in F[z]. 


7. If a and b are elements of a field F and a ¥ 0, show that the polynomials 1, 
ax + b, (ax + b)?, (ax + b)’, ... form a basis of F(x]. 


8. If F is a field and h is a polynomial over F of degree > 1, show that the map- 
ping f — f(h) is a one-one linear transformation of F'[x] into F[x]. Show that this 
transformation is an isomorphism of F'[x] onto F[2] if and only if deg h = 1. 


9. Let F be a subfield of the complex numbers and let T, D be the transformations 
on F[zx] defined by 


c 


(doe) te 
ixi ) = sgi 
a imel tt 


n R n à a 
D| D aat}= D inzol 
i=0 i 


i=] 





and 


(a) Show that T is a non-singular linear operator on F [2x]. Show also that T 
is not invertible. 

(b) Show that D is a linear operator on F[zx] and find its null space. 

(c) Show that DT = I, and TD # I. 

(d) Show that T[(Tf)g] = (Tf)(T9) — T[f(Tg)] for all f, g in Fz]. 

(e) State and prove a rule for D similar to the one given for T in (d). 

(f) Suppose V is a non-zero subspace of F[x] such that Tf belongs to V for 
each f in V. Show that V is not finite-dimensional. 

(g) Suppose V is a finite-dimensional subspace of F(x]. Prove there is an 
integer m > 0 such that D”f = 0 for each f in V. 
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4.3. Lagrange Interpolation 


Throughout this section we shall assume F is a fixed field and that 
lo t1,...,t, are n + 1 distinct elements of F. Let V be the subspace of 
F [x] consisting of all polynomials of degree less than or equal to n (to- 
gether with the 0-polynomial), and let L; be the function from V into F 
defined for f in V by 


L) =f), OStSn 


By part (i) of Theorem 2, each L; is a linear functional on V, and one of 
the things we intend to show is that the set consisting of Lo, Ii,..., Ln 
is a basis for V*, the dual space of V. 

Of course in order that this be so, it is sufficient (cf. Theorem 15 of 


Chapter 3) that {Lo, Li, . . . , La} be the dual of a basis {Po, Pi, ..., Pa} 
of V. There is at most one such basis, and if it exists it is characterized by 
(4-11) L,(Pi) = Pilti) = õi. 


The polynomials 


(ti — to) «++ (ti — ti) (li — ten) +: (li — ta) 


Ż t v) 
mA (7 ~t; 
are of degree n, hence belong to V, and by Theorem 2, they satisfy (4-11). 
If f = È c:P;, then for each j 
t 


(4-13) JG) = E Pilt) = e 





(4-12) p, -CT (x — te-a)(a — ta) «++ (E — te) 


Since the 0-polynomial has the property that O(¢) = 0 for each ż in F, it 
follows from (4-13) that the polynomials Po, Pi,..., Pa are linearly in- 
dependent. The polynomials 1, z,..., x" form a basis of V and hence the 
dimension of V is (n +1). So, the independent set {Po, Pi, ..., Pn} 
must also be a basis for V. Thus for each f in V 


(4-14) f= È UPa 


The expression (4-14) is called Lagrange’s interpolation formula. Set- 
ting f = x’ in (4-14) we obtain 


ti = È (ti)iPy. 
i=0 
Now from Theorem 7 of Chapter 2 it follows that the matrix 
lb B oo & 
2 ER n 
(4-15) TAa. 


l h Èo g 
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is invertible. The matrix in (4-15) is called a Vandermonde matrix; it 
is an interesting exercise to show directly that such a matrix is invertible, 
when to, t),..., t are n + 1 distinct elements of F. 

If f is any polynomial over F we shall, in our present discussion, de- 
note by f~ the polynomial function from F into F taking each żin F into 
f(f). By definition (cf. Example 4, Chapter 2) every polynomial function 
arises in this way; however, it may happen that f~ = g~ for two poly- 
nomials f and g such that f Æ g. Fortunately, as we shall see, this un- 
pleasant situation only occurs in the case where F is a field having only 
a finite number of distinct elements. In order to describe in a precise way 
the relation between polynomials and polynomial functions, we need to 
define the product of two polynomial functions. If f, g are polynomials 
over F, the product of f~ and g” is the function f~g~ from F into F given by 


(4-16) SIO =F OTE,  tinF. 
By part (ii) of Theorem 2, (fg)(t) = f(é)g(é), and hence 


O =f Og 
for each ¢ in F. Thus f~g~ = (fg)~, and is a polynomial function. At this 
point it is a straightforward matter, which we leave to the reader, to verify 
that the vector space of polynomial functions over F becomes a linear 
algebra with identity over F if multiplication is defined by (4-16). 


Definition. Let F be a field and let Q and Q~ be linear algebras over F. 
The algebras Q and Q~ are said to be isomorphic if there is a one-to-one map- 
ping a > a~ of Q onto Q~ such that 


(a) (ca + dB)~ = ca~ + d~ 

(b) (ap) = a~B~ 
for all a, B in Q and all scalars c, d in F. The mapping a — a™ is called an 
isomorphism of Q onto Q~. An isomorphism of Q onto Q~ is thus a vector- 


space isomorphism of Q onto Q% which has the additional property (b) of 
‘preserving’ products. 


Examp_e 4. Let V be an n-dimensional vector space over the field F. 
By Theorem 13 of Chapter 3 and subsequent remarks, each ordered basis 
® of V determines an isomorphism T > [T]g of the algebra of linear 
operators on V onto the algebra of n X n matrices over F. Suppose now 
that U is a fixed linear operator on V and that we are given a polynomial 


n . 
f= È ar 
i=0 
with coefficients c: in F. Then 


JU) = È aU! 
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and since T => [Tg is a linear mapping 

Wyle = È ele 
Now from the additional fact that 

[T:T:]e = [TilalTole 
for all Ti, Te in L(V, V) it follows that 

[U‘]e = (Ule), 2<i<n. 

As this relation is also valid for 7 = 0, 1 we obtain the result that 


(4-17) [f(U)]e = f([U]e). 


In words, if U is a linear operator on V, the matrix of a polynomial in U, 
in a given basis, is the same polynomial in the matrix of U. 


Theorem 3. If F is a field containing an infinite number of distinct 
elements, the mapping f —> f~ is an isomorphism of the algebra of polynomials 
over F onto the algebra of polynomial functions over F. 


Proof. By definition, the mapping is onto, and if f, g belong to 
F [2] it is evident that 


(of + dg)” = df~ + dg~ 


for all scalars c and d. Since we have already shown that (fg)~ = f~g7, we 
need only show that the mapping is one-to-one. To do this it suffices by 
linearity to show that f~ = 0 implies f = 0. Suppose then that f is a poly- 
nomial of degree n or less such that f’ = 0. Let to, t, ...,¢: beany n + 1 
distinct elements of F. Since f~ = 0, f(t) = 0 for i = 0, 1,..., n, and it 
is an immediate consequence of (4-14) that f = 0. I 


From the results of the next section we shall obtain an altogether 
different proof of this theorem. 


Exercises 


1. Use the Lagrange interpolation formula to find a polynomial f with real co- 
efficients such that f has degree < 3 and f(—1) = —6, f(0) = 2, fA) = —2, 
f(2) = 6. 


2. Leta, 8, y, 6 be real numbers. We ask when it is possible to find a polynomial f 
over R, of degree not more than 2, such that f(—1) = a, f) = £, f(3) = y and 
f(0) = 6. Prove that this is possible if and only if 


3a + 66 — y — 86 = 0. 
3. Let F be the field of real numbers, 
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200 0 
_|@20 0 
we 6030 
0 0 0 1 

p = (x — 2)(x — 3) (z — 1). 


(a) Show that p(A) = 0. 

(b) Let Pi, Po, P; be the Lagrange polynomials for t = 2, te = 3, t = 1, 
Compute E; = P;(A), i = 1, 2, 3. 

(c) Show that #1 + E: + E; = I, EE; = 0 if i £ j, E? = B;. 

(d) Show that A = 2E, + 3E; + Es 

4, Let p = (x — 2)(x — 3)(x — 1) and let T be any linear operator on R* such 
that p(T) = 0. Let Pi, P2, P; be the Lagrange polynomials of Exercise 3, and let 
E; = P;(T), i = 1, 2, 3. Prove that 

EE, + by + By = T, EE; = 0 if i=j, 
E? = E, and T = 2E, + 3E: + E. 

5. Let n be a positive integer and F a field. Suppose A is an n X n matrix over F 
and P is an invertible n X n matrix over F. If f is any polynomial over F, prove 
that 

f(POAP) = P7¥(A)P. 

6. Let F be a field. We have considered certain special linear functionals on F[zx] 

obtained via ‘evaluation at t’: 


L(f) = f(t). 


Such functionals are not only linear but also have the property that L(fg) = 
L(f)L(g). Prove that if L is any linear functional on F[x] such that 


L(fg) = L(f)L(g) 
for all f and g, then either L = 0 or there isa tin F such that L(f) = f(t) for all f. 


4.4, Polynomial Ideals 


In this section we are concerned with results which depend primarily 
on the multiplicative structure of the algebra of polynomials over a field. 


Lemma. Suppose f and d are non-zero polynomials over a field F such 
that deg d < deg f. Then there exists a polynomial g in F[x] such that either 


f—dg=0 or deg (f — dg) < deg f. 
Proof. Suppose 


1 
f= ant" + E az’, Om 40 
i=0 


J= 


and that 
n-1 
d = baz” + È bitó, bn # 0. 
i=0 
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Then m > n, and 
f- (ema =0 or deg [ — (§2)o-ra] < deg f. 


Thus we may take g = (72) zx”, I 


Using this lemma we can show that the familiar process of ‘long 
division’ of polynomials with real or complex coefficients is possible over 
any field. 


Theorem 4. Iff, d are polynomials over a field F and d is different 
from 0 then there exist polynomials q, r in F[x] such that 

(i) f = dq +r. 

(ii) either r = 0 or deg r < deg d. 


The polynomials q, r satisfying (i) and (ìi) are unique. 


Proof. If f is 0 or deg f < deg d we may take q = 0 and r = f. In 
case f ~ 0 and deg f > deg d, the preceding lemma shows we may choose 
a polynomial g such that f — dg = 0 or deg (f — dg) < deg f. If f — 
dg = 0 and deg (f — dg) > deg d we choose a polynomial h such that 
(f — dg) — dh = 0 or 


deg [f — dig + h)] < deg (f — dg). 


Continuing this process as long as necessary, we ultimately obtain poly- 
nomials q, r such that r = 0 or deg r < deg d, and f = dq + r. Now sup- 
pose we also have f = dm + rı where rı = 0 or deg rı < deg d. Then 
dq +r = dqı + rı, and díq — q) = rı — r. If q — qı = 0 then díq — qı) # 
0 and 

deg d + deg (q — qi) = deg (rı — r). 


But as the degree of rı — r is less than the degree of d, this is impossible 
and q — qı = 0. Hence also r -r = 0. I 


Definition. Let d be a non-zero polynomial over the field F. If f is in 
F[x], the preceding theorem shows there is at most one polynomial q in F[x] 
such that f = dq. If such a q exists we say that d divides f, that f is divisible 
by d, that f is a multiple of d, and call q the quotient of f and d. We 
also write q = f/d. 


Corollary 1. Let f be a polynomial over the field F, and let c be an ele- 
ment of F. Then f is divisible by x — c if and only if f(c) = 0. 


Proof. By the theorem, f = (x — c)q + r where r is a scalar 
polynomial. By Theorem 2, 


f(c) = Ogle) + r(c) = r(e). 
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Hence r = 0 if and only if f(c) = 0. I 


Definition. Let F be a field. An element c in F is said to be a root or 
a zero of a given polynomial f over F if f(c) = 0. 


Corollary 2. A polynomial f of degree n over a field F has at most n roots 
in F. 

Proof. The result is obviously true for polynomials of degree 0 
and degree 1. We assume it to be true for polynomials of degree n — 1. If 
a is a root of f, f = (x — a)q where q has degree n — 1. Since f(b) = 0 if 
and only if a = b or q(b) = 0, it follows by our inductive assumption that 
f has at most n roots. f 


The reader should observe that the main step in the proof of Theorem 
3 follows immediately from this corollary. 

The formal derivatives of a polynomial are useful in discussing mul- 
tiple roots. The derivative of the polynomial 


f= catar + +++ + er” 
is the polynomial 
Jl = ca + 2cor + +++ + nena}, 
We also use the notation Df = f’. Differentiation is linear, that is, D is a 


linear operator on F[z]. We have the higher order formal derivatives 
f” = D*f, f® = Df, and so on. 


Theorem 5 (Taylor’s Formula). Let F be a field of characteristic 
zere, c an element of F, and n a positive integer. If f is a polynomial over f 
with deg f < n, then 


fe 2a (c)(x ~ )*. 


Proof. Taylor’s formula is a consequence of the binomial theorem 
and the linearity of the operators D, D?, ..., D”. The binomial theorem 
is easily proved by induction and asserts that 


(aon = 5 (7) arto! 
k=0 \k 
where 
(K) _— mt _ mm—1)-:: (m-k+1) 
k) kmk)! 1.2. k 
is the familiar binomial coefficient giving the number of combinations of 
m objects taken k at a time. By the binomial theorem 


r” = [c + (x — c)]™ 
3 (7) e(r — c) 


k=0 


= c™ +.me™ "(x —c) + + (@ ce)” 
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and this is the statement of Taylor’s formula for the case f = x”. If 


n 
f= 2 Ang” 
m=0 








then 
D'f(c) = È an(D*x™) (c) 
and 
3 PIO fe ot = EB, OP) le = of 
k=0 $ km : 
= 2a, 308 (c)(a — e) 
m k . 
= È ant” 
=f. I 


It should be noted that because the polynomials 1, (tx —c),..., 
(x — c)” are linearly independent (cf. Exercise 6, Section 4.2) Taylor’s 
formula provides the unique method for writing f as a linear combination 
of the polynomials (x — c)* (0 < k < n). 

Although we shall not give any details, it is perhaps worth mentioning 
at this point that with the proper interpretation Taylor’s formula is also 
valid for polynomials over fields of finite characteristic. If the field F has 
finite characteristic (the sum of some finite number of 1’s in F is 0) then 
we may have k! = 0 in F, in which case the division of (Déf) (c) by k! is 
meaningless. Nevertheless, sense can be made out of the division of D*f 
by k!, because every coefficient of D*f is an element of F multiplied by an 
integer divisible by k! If all of this seems confusing, we advise the reader 
to restrict his attention to fields of characteristic 0 or to subfields of the 
complex numbers. 

If c is a root of the polynomial f, the multiplicity of c as a root of 
f is the largest positive integer r such that (x — c)” divides f. 

The multiplicity of a root is clearly less than or equal to the degree 
of f. For polynomials over fields of characteristic zero, the multiplicity 
of cas a root of f is related to the number of derivatives of f that are 0 at c. 


Theorem 6. Let F be a field of characteristic zero and f a polynomial 
over F with deg f < n. Then the scalar c is a root of f of multiplicity r if and 
only if 


(Dfe) =0, O<k<r—-1 
(D'f)(c) # 0. 


Proof. Suppose that r is the multiplicity of c as a root of f. Then 
there is a polynomial g such that f = (x — c)'g and g(c) = 0. For other- 
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wise f would be divisible by (x — c)rt!, by Corollary 1 of Theorem 4. By 
Taylor’s formula applied to g 


sza- SO oe- or] 


= > or (ro) 


m=0 


(£ — c)rt™ 


Since there is only one way to write f as a linear combination of the powers 
(x — c (0 < k < n) it follows that 

Oif0<k<r-l1 

(D¥f)(c) _ poe 
ko TIC) epc h< 
g pr Ssksn 

Therefore, D*f(c) = 0 for 0 < k <r — 1, and D*f(c) = g(c) ¥ 0. Con- 
versely, if these conditions are satisfied, it follows at once from Taylors 
formula that there is a polynomial g such that f = (x — c)’g and g(c) ¥ 0. 
Now suppose that r is not the largest positive integer such that (x — c)’ 
divides f. Then there is a polynomial h such that f = (x — c)"t*h. But 
this implies g = (x — c)h, by Corollary 2 of Theorem 1; hence g(c) = 0, 
a contradiction. J 








Definition. Let F be a field. An ideal in F[x] is a subspace M of 
F[x] such that fg belongs to M whenever f is in F [x] and g is in M. 


ExamPLE 5. If F is a field and d is a polynomial over F, the set 
M = dF[z], of all multiples df of d by arbitrary f in F [x], is an ideal. For 
M is non-empty, M in fact contains d. If f, g belong to F[z] and c is a 
scalar, then 
c(df) — dg = d(ef — g) 
belongs to M, so that M is a subspace. Finally M contains (df)g = d(fg) 
as well. The ideal M is called the principal ideal generated by d. 


ExamMPLE 6. Let di, .. ., da be a finite number of polynomials over F. 
Then the sum M of the subspaces d;F [x] is a subspace and is also an ideal. 
For suppose p belongs to M. Then there exist polynomials fı, ...,fn in 
F[zx] such that p = difi + +- +d,f. If g is an arbitrary polynomial 


over F, then 
pg = (fig) +e + du(fag) 
so that pg also belongs to M. Thus M is an ideal, and we say that M is the 
ideal generated by the polynomials, di, .. ., dn. 
Examp.e 7. Let F be a subfield of the complex numbers, and con- 


sider the ideal 
M = (x + 2)F[x] + (2? + 8x + 16)F[z]. 
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We assert that M = F[z]. For M contains 
x? + 8x + 16 — z(z + 2) = 6c + 16 


and hence M contains 62 + 16 — 6(x + 2) = 4. Thus the scalar poly- 
nomial 1 belongs to M as well as all its multiples. 


Theorem 7. If F is a field, and M is any non-zero ideal in F[x], there 
is a unique monic polynomial d in F [x] such that M is the principal ideal 
generated by d. 


Proof. By assumption, M contains a non-zero polynomial; among 
all non-zero polynomials in M there is a polynomial d of minimal degree. 
We may assume d is monic, for otherwise we can multiply d by a scalar to 
make it monic. Now if f belongs to M, Theorem 4 shows that f = dq +r 
where r = 0 or deg r < deg d. Since d is in M, dq and f ~ dq = r also 
belong to M. Because d is an element of M of minimal degree we cannot 
have deg r < deg d, so r = 0. Thus M = dF[z]. If g is another monic 
polynomial such that M = gF[x], then there exist non-zero polynomials 
p, q such that d = gp and g = dq. Thus d = dpq and 


deg d = deg d + deg p + deg q. 


Hence deg p = deg q = 0, and as d,g are monic, p = q = 1. Thus 
d= qg. 


It is worth observing that in the proof just given we have used a 
special case of a more general and rather useful fact; namely, if p is a non- 
zero polynomial in an ideal M and if f is a polynomial in M which is not 
divisible by p, then f = pq + r where the ‘remainder’ r belongs to M, is 
different from 0, and has smaller degree than p. We have already made 
use of this fact in Example 7 to show that the scalar polynomial 1 is the 
monic generator of the ideal considered there. In principle it is always 
possible to find the monic polynomial generating a given non-zero ideal. 
For one can ultimately obtain a polynomial in the ideal of minimal degree 
by a finite number of successive divisions. 


Corollary. If pi,...,Pn are polynomials over a field F, not all of 
which are 0, there is a unique monic polynomial d in F [x] such that 


(a) d is in the ideal generated by pi, . +. Pn; 

(b) d divides each of the polynomials pi. 
Any polynomial satisfying (a) and (b) necessarily satisfies 

(c) d is divisible by every polynomial which divides each of the poly- 
nomials pi... , Pns 


Proof. Let d be the monic generator of the ideal 
pi [x] + +++ + pa [2]. 
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Every member of this ideal is divisible by d; thus each of the polynomials 
p: is divisible by d. Now suppose f is a polynomial which divides each of 
the polynomials Pı, ..., Pn. Then there exist polynomials gi,...,9n 
such that p: = fg; 1 < i < n. Also, since d is in the ideal 

pF [£] + ++ + pF [c], 
there exist polynomials qı, - . . , qn in F[z] such that 

d = pıqı + bijušo + DPnQn- 
Thus 

d= figg + REN + 9nQn]. 


We have shown that d is a monic polynomial satisfying (a), (b), and (ce). 
If d’ is any polynomial satisfying (a) and (b) it follows, from (a) and the 
definition of d, that a’ is a scalar multiple of d and satisfies (c) as well. 
Finally, in case d’ is a monic polynomial, we haved’ =d. J 


Definition. If pi,..., Pn are polynomials over a field F, not all of 
which are 0, the monic generator d of the ideal 


pik [x] +--+: + paF [x] 
is called the greatest common divisor (g.c.d.) of Pu, ..., Pa. This 
terminology is justified by the preceding corollary. We say that the poly- 
nomtals pi, ..., Pn are relatively prime if their greatest common divisor 
is 1, or equivalently if the ideal they generate is all of F [x]. 


EXampLe 8. Let C be the field of complex numbers. Then 
(a) g.c.d. (x + 2, x? + 8r + 16) = 1 (see Example 7); 
(b) g.c.d. ((2 — 2)%(a + i), (x — 2)(22 + 1)) = (x — 2)(2 + i). For, 
the ideal 
(x — 2)%(a + ÙF [r] + (x — 2) (x? + 1)F[z] 
contains 
(a — 2)%@ + 1) — (z — 2)@? + 1) = (x —2)@ + — 2). 
Hence it contains (x — 2)(z + i), which is monic and divides both 
(x — 2)2(a +i) and (x — 2)(x? + 1). 
EXAMPLE 9. Let F be the field of rational numbers and in F[z] let 
M be the ideal generated by 
(2 — 1)(a + 2)?, (x + 2)2(a — 3), and (a — 3). 
Then M contains 
a(z +2} [(z — 1) — (z — 3)] = (z + 2)? 


and since 


(z + 2)? = (z — 3)\(z +7) — 17 
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M contains the scalar polynomial 1. Thus M = F'[z] and the polynomials 
(g@-—1)@+2)?, (@+2)%2-3), and (z-3) 


are relatively prime. 


Exercises 


1. Let Q be the field of rational numbers. Determine which of the following subsets 
of Q[x] are ideals. When the set is an ideal, find its monic generator. 
(a) all f of even degree; 
(b) all f of degree > 5; 
(c) all f such that f(0) = 0; 
(d) all f such that f(2) = f(4) = 0; 
(e) all f in the range of the linear operator T defined by 


2. Find the g.c.d. of each of the following pairs of polynomials 
(a) 225 — r? — 32? — 62 + 4, tt + r? — r? — 22 — 2; 
(b) 3a4 + 822 — 3, x3 + 22? + 3z + ô: 
(c) zt — 22 — 2r? — 22 — 3, 2° + 62° 4+ 72+ 1. 


3. Let A be an n X n matrix over a field F. Show that the set of all polynomials 
f in F[z] such that f(A) = 0 is an ideal. 


4. Let F be a subfield of the complex numbers, and let 


asio 3} 


Find the monic generator of the ideal of all polynomials f in F[z] such that 
f(A) = 0. 


5. Let F be a field. Show that the intersection of any number of ideals in F[2] 
is an ideal. 


6. Let F be a field. Show that the ideal generated by a finite number of poly- 
nomials fı, ..., fa in F[z] is the intersection of all ideals containing f,,..., fn 


7. Let K be a subfield of a field F, and suppose f, g are polynomials in A[z]. 
Let Mx be the ideal generated by fand g in A[x] and Mr be the ideal they generate 
in F[z]. Show that Myx and Mr have the same monic generator. 


4.5. The Prime Factorization 


of a Polynomial 


In this section we shall prove that each polynomial over the field F 
can be written as a product of ‘prime’ polynomials. This factorization 
provides us with an effective tool for finding the greatest common divisor 
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of a finite number of polynomials, and in particular, provides an effective 
means for deciding when the polynomials are relatively prime. 


Definition. Let F be a field. A polynomial f in F[x] is said to be 
reducible over F if there exist polynomials g, h in F[x] of degree > 1 such 
that f = gh, and if not, f is said to be irreducible over F. A non-scalar 
irreducible polynomial over T is called a prime polynomial over I", and we 
sometimes say tt is a prime in I'[x]. 


TexampueE 10. The polynomial x? + 1 is reducible over the field C of 
complex numbers. For 


xr? +1 = (x +i) (x — i) 


and the polynomials « +7, x ~ i belong to C[z]. On the other hand, 
x? + 1 is irreducible over the field R of real numbers. For if 


x? + 1 = (ax + b)(a’x + 0’) 
with a, a’, b, b’ in R, then 
ad = I, ab’ -+ ba’ = 0, bb’ = 1. 


These relations imply a? + b? = 0, which is impossible with real numbers 
a and b, unless a = b = 0. 


Theorem 8. Let p, f, and g be polynomials over the field F. Suppose 
that p is a prime polynomial and that p divides the product fg. Then either p 
divides f or p divides g. 


Proof. It is no loss of generality to assume that p is a monic prime 
polynomial. The fact that p is prime then simply says that the only monic 
divisors of p are 1 and p. Let @ be the g.c.d. of f and p. Then either 
d = 1 ord = p, since d is a monic polynomial which divides p. If @ = p, 
then p divides f and we are done. So suppose d = 1, i.e., suppose f and p 
are relatively prime. We shall prove that p divides g. Since (f, p) = 1, 
there are polynomials fo and po such that 1 = fof + pop. Multiplying by g, 
we obtain 


g = fofg + popg 
= (fg) fo + p(pog). 


Since p divides fg it divides (fg)fo, and certainly p divides p(pog). Thus 
p dividesg. f 


Corollary. If pisa prime and divides a product fı «++ fn, then p divides 
one of the polynomials fi, . . . , fn- 
Proof. The proof is by induction, When n = 2, the result is simply 


the statement of Theorem €. Suppose we have proved the corollary for 
n = k, and that p divides the product fı --- fis of some (k + 1) poly- 
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nomials. Since p divides (fı «++ Je)fk+ either p divides f,41 or p divides 
fi ++- fi. By the induction hypothesis, if p divides fı --- f,, then p divides 
f; for some j, 1 < 7 < k. So we see that in any case p must divide some f;, 
1<j<k+1. J 


Theorem 9. If ¥ is a field, anon-scalar monic polynomial in F[x] can 
be factored as a product of monic primes in F [x] in one and, except for order, 
only one way. 


Proof. Suppose f is a non-scalar monic polynomial over F. As 
polynomials of degree one are irreducible, there is nothing to prove if 
deg f = 1. Suppose f has degree n > 1. By induction we may assume the 
theorem is true for all non-scalar monic polynomials of degree less than n. 
If f is irreducible, it is already factored as a product of monic primes, and 
otherwise f = gh where g and h are non-scalar monic polynomials of 
degree less than n. Thus g and h can be factored as products of monic 
primes in F [x] and hence so can f. Now suppose 


f= Diets Pm = H+ Gn 
where 71,..., Pm and qi,.-.,Qn are monic primes in F[z]. Then Pm 


divides the product qı +--+ qn. By the above corollary, Pm must divide 
some q:. Since q; and Pm are both monic primes, this means that 


(4-16) qi = Pm. 
From (4-16) we see that m = n = 1 if either m = 1 or n = 1. For 


deg f = 2 deg p; = 2 deg q;. 
i= j= 


In this case there is nothing more to prove, so we may assume m > 1 and 

n > 1. By rearranging the q’s we can then assume pm = qn, and that 
Pics: PmaPm = Qi *** Gn—1Pm- 

Now by Corollary 2 of Theorem 1 it follows that 


Pi -t Pmi = Q` Gnas 
As the polynomial pı --+ Pm-ı has degree less than n, our inductive 
assumption applies and shows that the sequence qı, . . . , qn is at most 
a rearrangement of the sequence pı, . . . , Dm-1. This together with (4-16) 
shows that the factorization of f as a product of monic primes is unique 
up to the order of the factors. J 


In the above factorization of a given non-scalar monic polynomial f, 
some of the monic prime factors may be repeated. If pi, po,..., Pr are 
the distinct monic primes occurring in this factorization of f, then 


(4-17) f = pip? +++ pr’, 


the exponent n; being the number of times the prime p; occurs in the 
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factorization. This decomposition is also clearly unique, and is called 
the primary decomposition of f. It is easily verified that every monic 
divisor of f has the form 


(4-18) pipes pr, OS me <n. 


From (4-18) it follows that the g.c.d. of a finite number of non-scalar 


monic polynomials fı.. .,fs is obtained by combining all those monic’ 


primes which occur simultaneously in the factorizations of fi,...,fs- 
The exponent to which each prime is to be taken is the largest for which 
the corresponding prime power is a factor of each f:i. If no (non-trivial) 
prime power is a factor of each f;, the polynomials are relatively prime. 


EXAMPLE 11. Suppose F is a field, and let a, b, c be distinct elements 
of F. Then the polynomials z — a, x — b, x — c are distinct monic primes 
in F[z]. If m, n, and s are positive integers, (x — c) is the g.c.d. of the 
polynomials. 

(x — b)”(x — c) and (x — a)™(x — c)§ 
whereas the three polynomials 
(x —b)"(@—c), @- araeo, (æ ara b) 


are relatively prime. 


Theorem 10. Let f be a non-scalar monic polynomial over the field F 
and let 
f= pr eee pr* 
be the prime factorization of f. For each j, 1 < j < k, let 


fj = f/p = I pr. 
ixj 


Then fı, . . . , fx are relatively prime. 


Proof. We leave the (easy) proof of this to the reader. We have 
stated this theorem largely because we wish to refer to it later. f 


Theorem 11. Let f be a polynomial over the field F with derivative f’. 
Then f is a product of distinct irreducible polynomials over F if and only if 
f and f’ are relatively prime. 


Proof. Suppose in the prime factorization of f over the field F 
that some (non-scalar) prime polynomial p is repeated. Then f = p?h for 
some h in F(x]. Then 

F = ph’ + 2pp'h 
and p is also a divisor of J”. Hence f and f’ are not relatively prime. 
Now suppose f = pı --- px, where pı, . . . , pk are distinct non-scalar 
irreducible polynomials over F. Let f; = f/p;. Then 


F = pifi + pofa t+ +++ + pie. 
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Let p be a prime polynomial which divides both f and f’. Then p = p; for 
some 2. Now p; divides f; for 7 ¥ i, and since p; also divides 


k 
f= È psi 
j=l 


we see that p: must divide pif; Therefore p; divides either f: or p. But p: 
does not divide f: since Pı, - - +» Pk are distinct. So p; divides pi. This is 
not possible, since p; has degree one less than the degree of p;. We con- 
clude that no prime divides both f and f’, or that, f and f’ are relatively 
prime. J 


Definition. The field F is called algebraically closed if every prime 
polynomial over F has degree 1. 


To say that F is algebraically closed means every non-scalar irreduc- 
ible monic polynomial over F is of the form (x — c). We have already 
observed that each such polynomial] is irreducible for any F. Accordingly, 
an equivalent definition of an algebraically closed field is a field F such 
that each non-scalar polynomial f in F [x] can be expressed in the form 

f = elz has (Fe) 
where c is a scalar, cı, . . . , c are distinct elements of F, and m,..., nk 
are positive integers. Still another formulation is that if f is a non-scalar 
polynomial over F, then there is an element c in F such that f(c) = 0. 

The field R of real numbers is not algebraically closed, since the poly- 
nomial (x? + 1) is irreducible over R but not of degree 1, or, because 
there is no real number c such that c? + 1 = 0. The so-called Funda- 
mental Theorem of Algebra states that the field C of complex numbers is 
algebraically closed. We shall not prove this theorem, although we shall 
use it somewhat later in this book. The proof is omitted partly because 
of the limitations of time and partly because the proof depends upon a 
‘non-algebraic’ property of the system of real numbers. For one possible 
proof the interested reader may consult the book by Schreier and Sperner 
in the Bibliography. 

The Fundamental Theorem of Algebra also makes it clear what the 
possibilities are for the prime factorization of a polynomial with real 
coefficients. If f is a polynomial with real coefficients and c is a complex 
root of f, then the complex conjugate ¢ is also a root of f. Therefore, those 
complex roots which are not real must occur in conjugate pair's, and the 
entire set of roots has the form {t, . . . , tz, C, Gy. - - , Cr, G-} where b, . . . , tk 
are real and c,..., c; are non-real complex numbers. Thus f factors 


f = celz — h) +++ (£ — ti)pi ++ pr 
where p; is the quadratic polynomial 


pi = (x — &)(x& — G). 
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These polynomials p; have real coefficients. We conclude that every 
irreducible polynomial over the real number field has degree 1 or 2. Each 
polynomial over R is the product of certain linear factors, obtained from 
the real roots of f, and certain irreducible quadratic polynomials. 


Exercises 


l. Let p be a monic polynomial over the field F, and let f and g be relatively 
prime polynomials over F. Prove that the g.c.d. of pf and pg is p. 


2. Assuming the Fundamental Theorem of Algebra, prove the following, If f and 
g are polynomials over the field of complex numbers, then g.c.d. (f,g) = 1 if and 
only if f and g have no common root. 


3. Let D be the differentiation operator on the space of polynomials over the 
field of complex numbers. Let f be a monic polynomial over the field of complex 
numbers. Prove that 

f= (@ — aq) +++ (@ — &) 
where c, ... , Cx are distinct complex numbers if and only if f and Df are relatively 
prime. In other words, f has no repeated root if and only if f and Df have no com- 
mon root, (Assume the Fundamental Theorem of Algebra.) 


4. Prove the following generalization of Taylor’s formula. Let f, g, and h be 
polynomials over a subfield of the complex numbers, with deg f < n. Then 


IO) = È IOW A. 


(Here f(g) denotes ‘f of g.’) 

For the remaining exercises, we shall need the following definition. If f, g, 
and p are polynomials over the field F with p + 0, we say that f is congruent to g 
modulo p if (f — g) is divisible by p. If f is congruent to g modulo p, we write 

f = g mod p. 
5. Prove, for any non-zero polynomial p, that congruence modulo p is an equiva- 
lence relation. 

(a) It is reflexive: f = f mod p. 

(b) It is symmetric: if f = g mod p, then g = f mod p. 

(c) Itis transitive: if f = g mod p and g = h mod p, then f = h mod p. 


6. Suppose f = g mod p and fı = gı mod p. 
(a) Prove that f + fi = g + gı mod p. 
(b) Prove that ff: = ggı mod p. 


7. Use Exercise 7 to prove the following. If f, g, h, and p are polynomials over the 
field F and p # 0, and if f = g mod p, then A(f) = h(g) mod p. 


8. If p is an irreducible polynomial and fg = 0 mod p, prove that either 
f =0modp org = 0 mod p. Give an example which shows that: this is false if p 
is not irreducible. 


189 


5. Determinants 


5.1. Commutative Rings 


In this chapter we shall prove the essential facts about determinants 
of square matrices. We shall do this not only for matrices over a field, but 
also for matrices with entries which are ‘scalars’ of a more general type. 
There are two reasons for this generality. First, at certain points in the 
next chapter, we shall find it necessary to deal with determinants of 
matrices with polynomial entries. Second, in the treatment of determi- 
nants which we present, one of the axioms for a field plays no role, namely, 
the axiom which guarantees a multiplicative inverse for each non-zero 
element. For these reasons, it is appropriate to develop the theory of 
determinants for matrices, the entries of which are elements from a com- 
mutative ring with identity. 


Definition. A ring is a set K, together with two operations (x, y) > 
x + y and (x, y) > xy satisfying 


(a) K is a commutative group under the operation (x,y) 3x + y (K 
is a commutative group under addition); 

(b) (xy)z = x(yz) (multiplication is associative) ; 

(c) x(y +2) = xy + xz; (y +2)x = yx + 2x (the two distributive 
laws hold). 

If xy = yx for all x and y in K, we say that the ring K is commutative. 
If there is an element 1 in K such that 1x = xl = x for each x, K is said 
to be a ring with identity, and 1 is called the identity for K. 
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We are interested here in commutative rings with identity. Such a 
ring can be described briefly as a set K, together with two operations 
which satisfy all the axioms for a field given in Chapter 1, except possibly 
for axiom (8) and the condition 1 # 0. Thus, a field is a commutative 
ring with non-zero identity such that to each non-zero x there corresponds 
an element xt with xz~! = 1. The set of integers, with the usual opera- 
tions, is a commutative ring with identity which is not a field. Another 
commutative ring with identity is the set of all polynomials over a field, 
together with the addition and multiplication which we have defined for 
polynomials. 

If K is acommutative ring with identity, we define an m X n matrix 
over K to be a function A from the set of pairs (i, 7) of integers, 1 < i < m, 
1 <j < n, into K. As usual we represent such a matrix by a rectangular 
array having m rows and n columns. The sum and product of matrices 
over K are defined as for matrices over a field 

(A + B)i; = Ai; + Bi; 
(AB); = Z ABr; 


the sum being defined when A and B have the same number of rows and 
the same number of columns, the product being defined when the number 
of columns of A is equal to the number of rows of B. The basic algebraic 
properties of these operations are again valid. For example, 


A(B+C) =AB+ AC, (AB)C = A(BC), etc. 


As in the case of fields, we shall refer to the elements of K as scalars. 
We may then define linear combinations of the rows or columns of a 
matrix as we did earlier. Roughly speaking, all that we previously did for 
matrices over a field is valid for matrices over K, excluding those results 
which depended upon the ability to ‘divide’ in K. 
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Let K be a commutative ring with identity. We wish to assign to 
each n X n (square) matrix over K a scalar (element of K) to be known 
as the determinant of the matrix. It is possible to define the determinant 
of a square matrix A by simply writing down a formula for this determi- 
nant in terms of the entries of A. One can then deduce the various prop- 
erties of determinants from this formula. However, such a formula is 
rather complicated, and to gain some technical advantage we shall proceed 
as follows. We shall define a ‘determinant function’ on K**” as a function 
which assigns to each n X n matrix over K a scalar, the function having 
these special properties. It is linear as a function of each of the rows of the 
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matrix: its value is 0 on any matrix having two equal rows; and its value 
on the n X n identity matrix is 1. We shall prove that such a function 
exists, and then that it is unique, i.e., that there is precisely one such 
function. As we prove the uniqueness, an explicit formula for the determi- 
nant will be obtained, along with many of its useful properties. 

This section will be devoted to the definition of ‘determinant function’ 
and to the proof that at least one such function exists. 


Definition. Let K be a commutative ring with identity, n a positive 
integer, and let D be a function which assigns to each n X n matrix A over K 
a scalar D(A) in K. We say that D is n-linear if for each i, 1 <i <n, 
D is a linear function of the ith row when the other (n — 1) rows are held fixed. 


This definition requires some clarification. If D is a function from 
K<" into K, and if a,...,@, are the rows of the matrix A, let us also 
write 

D(A) = Daa... , an) 
that is, let us also think of D as the function of the rows of A. The state- 
ment that D is n-linear then means 


(5-1) D(a, ..., Cai + at,.. +5 Qn) = CD(an,..., Ai.. e, Xn) 
+ D(a, .. 2) Qty oy Gy). 
If we fix all rows except row 7 and regard D as a function of the ith row, 
it is often convenient to write D(a;) for D(A). Thus, we may abbreviate 
(5-1) to 
D(ca; + ai) = cD(a;) + Dla) 


so long as it is clear what the meaning is. 


EXAMPLE 1. Let k,...,k, be positive integers, 1 < k; < n, and 
let a be an element of K. For each n X n matrix A over K, define 
(5-2) D(A) = aA(1, ky) +++ A(n, ka). 


Then the function D defined by (5-2) is n-linear. For, if we regard D as a 
function of the ith row of A, the others being fixed, we may write 


D(a) = A(i, ki)b 


where b is some fixed element of K. Let ai = (Afy,..., Ain). Then we 
have 


D(cai + ait) 


[cA (i, ki) + A’ (i, k:)]b 
= cD(a:) + D(at). 


Thus D is a linear function of each of the rows of A. 
A particular n-linear function of this type is 


D(A) = AnAg +++ Ann 
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In other words, the ‘product of the diagonal entries’ is an n-linear function 
on Kr, 


EXxamPLeE 2. Let us find all 2-linear functions on 2 X 2 matrices over 
K. Let D be such a function. If we denote the rows of the 2 X 2 identity 
matrix by «a, é we have 
D(A) = D(Ane. + Are, Ane + Aze). 
Using the fact that D is 2-linear, (5-1), we have 
D(A) = AnD(e1, Aner + Aze) + AreD(e2, Ane + Ane) 
= AnAnD(a, €) + AunAz2D(a, €2) 
+ A12A21D(e2, a) + A1zA 2D (eo, €2). 
Thus D is completely determined by the four scalars 
D(a,a),  D(a,e), Dle,a), and Dle, e). 
The reader should find it easy to verify the following. If a, b, c, d are any 
four scalars in K and if we define 
D(A) = AnAna + AnAgb + A1Aaic + A12A ood 
then D is a 2-linear function on 2 X 2 matrices over K and 
D(a, a) = a, D(a, &) = b 
D(ex, &) = c, Dle €2) = d. 


Lemma. A linear combination of n-linear functions is n-linear. 

Proof. It suffices to prove that a linear combination of two 
n-linear functions is n-linear. Let D and E be n-linear functions. If a and b 
belong to K, the linear combination aD + bE is of course defined by 

(aD + bE)(A) = aD(A) + bE(A). 
Hence, if we fix all rows except row 7 
(aD + bE)(ca; + a) = aD(ca; + at) + bE(ca:i + a’) 

acD(a;) + aD(at) + beE(a.) + bE(at) 
= c(aD + bE)(ai) + (aD + bE). I 


If K is a field and V is the set of n X n matrices over K, the above 
lemma says the following. The set of n-linear functions on V is a subspace 
of the space of all functions from V into K. 


EXAMPLE 3. Let D be the function defined on 2 X 2 matrices over 
K by 


(5-3) D(A) = AnA — AjeAar. 
Now D is the sum of two functions of the type described in Example 1: 
D = Dı + D: 


D(A) = AnAge 
D(A) = —Arânz. 
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By the above lemma, D is a 2-linear function. The reader who has had 
any experience with determinants will not find this surprising, since he 
will recognize (5-3) as the usual definition of the determinant of a 2 X 2 
matrix. Of course the function D we have just defined is not a typical 
2--linear function. It has many special properties. Let us note some of these 
properties. First, if J is the 2 X 2 identity matrix, then D(J) = 1, i.e., 
D (eir €) = 1. Second, if the two rows of A are equal, then 


D(A) = Anr — AvAn = 0. 


Third, if A’ is the matrix obtained from a 2 X 2 matrix A by interchang- 
ing its rows, then D(A’) = —D(A); for 


D(A’) = AtA22 — AiAz 
= AnAp — AvAn 
= —D(A). 


Definition. Let D be an n-linear function. We say D is alternating 
(or alternate) if the following two conditions are satisfied: 


(a) D(A) = 0 whenever two rows of A are equal. 
(b) If A’ is a matrix obtained from A by interchanging two rows of A, 
then D(A’) = —D(A). 


We shall prove below that any n-linear function D which satisfies (a) 
automatically satisfies (b). We have put both properties in the definition 
of alternating n-linear function as a matter of convenience. The reader 
will probably also note that if D satisfies (b) and A is a matrix with two 
equal rows, then D(A) = —D(A). It is tempting to conclude that D 
satisfies condition (a) as well. This is true, for example, if K is a field in 
which 1 + 1 = 0, but in general (a) is not a consequence of (b). 


Definition. Let K be a commutative ring with identity, and let n be a 
positive integer. Suppose D is a function from n X n matrices over K into 
K. We say that D is a determinant function 7f D is n-linear, alternating, 
and D(I) = 1. 


As we stated earlier, we shall ultimately show that there is exactly 
one determinant function on n X n matrices over K. This is easily seen 
for 1 X 1 matrices A = [a] over K. The function D given by D(A) =a 
is a determinant function, and clearly this is the only determinant func- 
tion on 1 X 1 matrices. We are also in a position to dispose of the case 
n = 2. The function 


D(A) = AnA» — AvAn 


was shown in Example 3 to be a determinant function. Furthermore, the 
formula exhibited in Example 2 shows that D is the only determinant 
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function on 2 X 2 matrices. For we showed that for any 2-linear function D 


D(A) = AnAnD(a, €) + AyAnD(e, €) 
+ árAaDles &) + A1242D (e, €2). 
If D is alternating, then 


D(a, é1) =. Dies, €) = 0 
and 
D(ea, €1) = —D(a, €2) = —D(I). 


If D also satisfies D(7) = 1, then 
D(A) = Anz — Arán 


EXAMPLE 4. Let F be a field and let D be any alternating 3-linear 
function on 3 X 3 matrices over the polynomial ring F[z]. 


Let 
z 0 =r? 
Az=|]0 1 0 } 
1 0 x3 


If we denote the rows of the 3 X 3 identity matrix by «1, €2, €, then 
D(A) = D(xa — x%e2, e & + Tes). 
Since D is linear as a function of each row, 


D(A) = xD(a, e2, € + £e) — x?2D(és, €2, + T363) 
tD(e1, €z, 1.) + 2*D (er, €2, €) — 2D (e3, €2, 1) — 2°D(Es, €2, 63). 


Because D is alternating it follows that 
D(A) = (zt + x?) D(a, €2, €3). 


il 


Lemma. Let D be a 2-linear function with the property that D(A) = 0 
for all 2 X 2 matrices A over K having egual rows. Then D is alternating. 


Proof. What we must show is that if A is a 2 X 2 matrix and 4’ 
is obtained by interchanging the rows of A, then D(A’) = —D(A). If the 
rows of A are œ and &, this means we must show that D(8, &) = —D/(a, R). 
Since D is 2-linear, 


D(a + B, a+ 8) = D(a, a) + D(a, B) + D(B, a) + D(B, 8). 
By our hypothesis D(a + 6, a + 8) = D(a, a) = D(B, 8) = 0. So 
0 = D(a, 8) + D,a). M 
Lemma. Let D be an n-linear function on n X n matrices over K. 


Suppose D has the property that D(A) = O whenever two adjacent rows of 
A are equal. Then D is alternating. 


Proof. We must show that D(A) = 0 when any two rows of A 
are equal, and that D(A’) = -— D(A) if A’ is obtained by interchanging 
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some two rows of A. First, let us suppose that A’ is obtained by inter- 
changing two adjacent rows of A. The reader should see that the argument 
used in the proof of the preceding lemma extends to the present case and 
gives us D(A’) = — D(A). 

Now let B be obtained by interchanging rows 7 and j of A, where 
i < j. We can obtain B from A by a succession of interchanges of pairs of 
adjacent rows. We begin by interchanging row 7 with row (ú + 1) and 
continue until the rows are in the order 


Oty oe s p Mandy Rigas e o 6» yy Qiy jdij e o e3 Ans 
This requires k = j — 2 interchanges of adjacent rows. We now move a; 
to the ¿th position using (k — 1) interchanges of adjacent rows. We have 


thus obtained B from A by k + (k — 1) = 2k — 1 interchanges of adja- 
cent rows. Thus 


D(B) = (—1)*-1D(A) = — D(A). 


Suppose A is any n X n matrix with two equal rows, say a; = a; 
with i < j. If j = i +1, then A has two equal and adjacent rows and 
D(A) = 0. If 7 >2 +1, we interchange æ: and a; and the resulting 
matrix B has two equal and adjacent rows, so D(B) = 0. On the other 
hand, D(B) = —D(A), hence D(A) = 0. $ 


Definition. If n > 1 and A is ann X n matrix over X, we let A(ilj) 
denote the (n — 1) X (n — 1) matriz obtained by deleting the ith row and 
jth column of A. If D is an (n — 1)-linear function and A is ann X n 
matrix, we put DulA) = D[A(ilj)}. 


Theorem 1. Let n > 1 and let D be an alternating {n — 1)-linear 
function on (n — 1) X (n — 1) matrices over K. For each j, 1 <j <n, 


the function E; defined by 


(5-4) BAA) = È (—1)AuDs(A) 


2s an alternating n-linear function on n X n matrices A. If D is a determi- 


nant function, so is each E; 


Proof. If A is an n X n matrix, D;;(A) is independent of the ¿th 
row of A. Since D is (n — 1)-linear, it is clear that D;; is linear as a fune- 
tion of any row except row 7. Therefore A,;);;(A) is an n-linear function 
of A. A linear combination of n-linear functions is n-linear; hence, Æ; is 
n-linear. To prove that £; is alternating, it will suffice to show that 
E;(A) = 0 whenever A has two equal and adjacent rows. Suppose œk = 
Ores Li Æ k andi ~k +1, the matrix A(7|j) has two equal rows, and 
thus D;;(A) = 0. Therefore 


EA) = (—1)**An;De(A) + (DEHA gD uA) 


Sec. 5.2 Determinant Functions 147 


Since Qk = Qk+l; 
Ar; = Aua; and A(klj) = Ak + 10). 
Clearly then £,(A) = 0. 

Now suppose D is a determinant function. If [“ is the n X n identity 
matrix, then J™(j|7) is the (n — 1) X (n — 1) identity matrix [&~», 
Since Jj? = 6,;, it follows from (5-4) that 
(5-5) E,1%) = DUT»), 

Now D(UI@-”) = 1, so that H;J™) = 1 and E; is a determinant func- 
tion. 


Corollary. Let K be a commutative ring with identity and let n be a 
positive integer. There exists at least one determinant function on K™. 
Proof. We have shown the existence of a determinant function 
on 1 X 1 matrices over K, and even on 2 X 2 matrices over K. Theorem 1 
tells us explicitly how to construct a determinant function on nX n 
matrices, given such a function on (n — 1) X (n — 1) matrices. The 
corollary follows by induction. § 


Examp.e 5. If B isa 2 X 2 matrix over K, we let 


|B| = By By — BBa. 


Then |B| = D(B), where D is the determinant function on 2 X 2 matrices. 
We showed that this function on K?< is unique. Let 














Au Ax Ais 
A =| Aa Ax Ags 
Ag, As Ax 
be a 3 X 3 matrix over K, If we define Fi, Ey, E; as in (5-4), then 
Ax A Axr Ais Ax Ais 
5- = ~ A A 
N EA A A a A Aa Ag 
Azn Asz An Ais = Au Ay 
Ce SER Baa hen An Aa As Ae 
An Ag Aun Ai An A 
Ko i Z —A A . 
Oe). BS A Al Ag Ag A Ae 




















It follows from Theorem 1 that EF, Ez, and E; are determinant functions. 
Actually, as we shall show later, E, = E, = E, but this is not yet appar- 
ent even in this simple case. It could, however, be verified directly, by 
expanding each of the above expressions. Instead of doing this we give 
some specific examples. 

(a) Let K = R[x] and 


z—-1 x? x 
A= 0 z—2 1 . 
@ 0 z—3 
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Then 
B,(A) = 1 see 
EA) om to t— sare 
= (x —1)\(@ — a — 3) 
and 
Bia) =a 75 -Eat A+e@-a a 

















(x — 1)(x — 2)(a — me 
(b) Let K = R and 


0 10 
A=|0 0 I}. 
10 0 
Then 
1 0 
miay= | |= 
0 1 
Ey(A) = -f H= 
0 1 
E;(A) = i 0 =1 
Exercises 


1. Each of the following expressions defines a function D on the set of 3 X 3 
matrices over the field of real numbers. In which of these cases is D a 3-linear 
function? 

(a) D(A) = Ay + Ax + Az; 

(b) D(A) = (Ay)? + 38A1A 29} 

(c) D(A) = ApAwAs3; 

(d) D(A) = AyzA mA 30 + 5A12A 2A 39; 
(e) D(A) = 0; 

(f) D(A) = 1. 


2. Verify directly that the three functions Æ, E» E; defined by (5-6), (5-7), and 
(5-8) are identical. 


3. Let K be a commutative ring with identity. If A is a 2 X 2 matrix over K, 
the classical adjoint of A is the 2 X 2 matrix adj A defined by 


r Ag a 
A= 
adj ee Ay 


If det denotes the unique determinant function on 2 X 2 matrices over K, show 
that 
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(a) (adj A)A = A(adj A) = (det A)J; 
(b) det (adj A) = det (A); 
(c) adj (A*‘) = (adj A)‘. 
(At denotes the transpose of A.) 
4, Let A bea 2 X 2 matrix over a field F. Show that A is invertible if and only 
if det A = 0. When A is invertible, give a formula for A}. 


5. Let A be a 2 X 2 matrix over a field F, and suppose that A? = 0. Show for 
each scalar c that det (ef — A) = œ. 


6. Let K be a subfield of the complex numbers and n a positive integer. Let 
Ju +++,jn and ky, ..., kn be positive integers not exceeding n. For an n X n 
matrix A over K define 


D(A) = A(j, kı) A(Jo, ke) res A(jny kn). 
Prove that D is n-linear if and only if the integers jı, ..., jn are distinct. 


% Let K be a commutative ring with identity. Show that the determinant func- 
tion on 2 X 2 matrices A over K is alternating and 2-linear as a function of the 
columns of A. 


8. Let K be a commutative ring with identity. Define a function D on 3 X 3 
matrices over K by the rule 


ee An | = An a Be ral 
D(A) = Au det ie A33 Andet be As + du det As, Ase 


Show that D is alternating and 3-linear as a function of the columns of A. 


9. Let K be a commutative ring with identity and D an alternating n-linear 
function on n X n matrices over K. Show that 
(a) D(A) = 0, if one of the rows of A is 0. 
(b) D(B) = D(A), if B is obtained from A by adding a scalar multiple of 
one row of A to another. 


10. Let F be a field, A a 2 X 3 matrix over F, and (ci, cz c3) the vector in F°? 
defined by 

Aw Ags 
Ay Ag 











qQ = ’ 2 = 


Ais a, C3 = An Ay 
Ag Ax Au Ar 





Show that 

(a) rank (A) = 2 if and only if (c1, ¢2, cs) # 0; 

(b) if A has rank 2, then (ci, c2, c3) is a basis for the solution space of the 
system of equations AX = 0. 


ll. Let K bea commutative ring with identity, and let D be an alternating 2-linear 
function on 2 X 2 matrices over K, Show that D(A) = (det A)D(J) for all A. 
Now use this result (no computations with the entries allowed) to show that 
det (AB) = (det A)(det B) for any 2 X 2 matrices A and B over K. 


12. Let F bea field and D a function on n X n matrices over F (with values in F). 
Suppose D(AB) = D(A)D(B) for all A, B. Show that either D(A) = 0 for all A, 
or D(J) = i. In the latter case show that D(A) = 0 whenever A is invertible. 


13. Let R be the field of real numbers, and let D be a function on 2 X 2 matrices 
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over R, with values in R, such that D(AB) = D(A)D(B) for all A, B. Suppose 


also that a ([2 al) * D G i) 


Prove the following. 


(a) D(O) = 0; 

(b) D(A) = Oif A? = 0; 

(c) D(B) = —D(A) if Bis obtained by interchanging the rows (or columns) 
of A; 


(d) D(A) = 0 if one row (or one column) of A is 0; 

(e) D(A) = 0 whenever A is singular. 
14. Let A be a 2 X 2 matrix over a field F. Then the set of all matrices of the 
form f(A), where f is a polynomial over F, is a commutative ring A with identity. 
If Bisa 2 X 2 matrix over K, the determinant of B is then a 2 X 2 matrix over F, 
of the form f(A). Suppose T is the 2 X 2 identity matrix over F and that B is the 


2 X 2 matrix over K 
B= k — Anl ~—Apl } 
E — Aal A a Aal 


Show that det B = f(A), where f = x? — (An + Aw)x + det A, and also that 
f(A) = 0. 


5.3. Permutations and the Uniqueness 
of Determinants 


In this section we prove the uniqueness of the determinant function 
on n X n matrices over K. The proof will lead us quite naturally to con- 
sider permutations and some of their basic properties. 

Suppose D is an alternating n-linear function on n X n matrices over 


K. Let A be an n X n matrix over K with rows a, a, --- , &n. If we de- 
note the rows of the n X n identity matrix over K by e1, €, +-- , €n, then 
(5-9) a, = E Ali, jes 1<icn. 

j=l 
Hence 


D(A) = D (z A(1, J)e; ao an) 
3 
= E A(1, j)Dlej an . - - , Qn). 
J 
If we now replace œ by D A(2, k)er, we see that 
k 


Dlej, ag.. y On) = ZAQ, k)D lej, €r. «+ On). 


Thus 
D(A) = 2 A(1, j)A (2, k)D(é5, Eky se ey Qn). 
ds 
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In D(e;, ex, . > > , dn) we next replace a; by X A (3, I)e: and so on. We finally 
obtain a complicated but theoretically important expression for D(A), 
namely 


(5-10) D(A) = 
2 AG, kı)A (2, ka) pee A(n, kn) D (eku Chay s+ ey Eka). 


ki, kresa’ kn 


In (5-10) the sum is extended over all sequences (ki, kz, . . . , kn) of positive 
integers not exceeding n. This shows that D is a finite sum of functions of 
the type described by (5-2). It should be noted that (5-10) is a consequence 
just of assumption that D is n-linear, and that a special case of (5-10) was 
obtained in Example 2. Since D is alternating, 


Deky Eks... Eka) = 0 


whenever two of the indices k; are equal. A sequence (kı, koe, ..., kn) 
of positive integers not exceeding n, with the property that no two of 
the k; are equal, is called a permutation of degree n. In (5-10) we need 
therefore sum only over those sequences which are permutations of 
degree n. 

Since a finite sequence, or n-tuple, is a function defined on the first n 
positive integers, a permutation of degree n may be defined as a one-one 


function from the set {1,2,...,n} onto itself. Such a function o corre- 
sponds to the n-tuple (sl, o2, . . ., on) and is thus simply a rule for order- 
ing 1, 2,..., nin some well-defined way. 


If D is an alternating n-linear function and A is an n X n matrix 
over K, we then have 


(5-11) D(A) = Z A (1, c1) --- A(n, on)Dlen . . . , €on) 


where the sum is extended over the distinct permutations ø of degree n. 
Next we shall show that 


(5-12) D(€s1).. +) €on) = #£D(a,.. «5 €n) 


where the sign + depends only on the permutation ø. The reason for this 
is as follows. The sequence (el, o2,...,on) can be obtained from the 
sequence (1, 2,...,) by a finite number of interchanges of pairs of 
elements. For example, if cl ¥ 1, we can transpose 1 and ol, obtaining 
(ol, ...,1,...). Proceeding in this way we shall arrive at the sequence 
(ol, ..., on) after n or less such interchanges of pairs. Since D is alter- 
nating, the sign of its value changes each time that we interchange two 
of the rows e; and e;. Thus, if we pass from (1, 2,..., n) to (ol, o2,..., on) 
by means of m interchanges of pairs (i, 7), we shall have 


D lens sey Eon) = (—1)”D(a, ony En). 
In particular, if D is a determinant function 
(5-13) Den, - +) €n) = (—1)” 
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where m depends only upon ø, not upon D. Thus all determinant func- 
tions assign the same value to the matrix with rows €91,..., €n, and this 
value is either 1 or —1. 

Now a basic fact about permutations is the following. If ø is a per- 
mutation of degree n, one can pass from the sequence (1, 2,..., n) to 
the sequence (cl, o2,...,o0n) by a succession of interchanges of pairs, 
and this can be done in a variety of ways; however, no matter how it is 
done, the number of interchanges used is either always even or always 
odd. The permutation is then called even or odd, respectively. One 
defines the sign of a permutation by 


1, if ois even 
—1, if cis odd 
the symbol ‘1’ denoting here the integer 1. 

We shall show below that this basic property of permutations can be 
deduced from what we already know about determinant functions. Let 
us assume this for the time being. Then the integer m occurring in (5-13) 
is always even if o is an even permutation, and is always odd if ø is an odd 
permutation. For any alternating n-linear function D we then have 


Dé; wey Eon) = (sgn a) D(a; essy En) 
and using (5-11) 


sgno = 


(5-14) D(A) = [z (sgn o)A (1, 01) «++ A(n, on) D(I). 


Of course J denotes the n X n identity matrix. 
From (5-14) we see that there is precisely one determinant function 
on n X n matrices over K. If we denote this function by det, it is given by 


(5-15) det (A) = Ð (sgn o)A (1, ol) --- A(n, on) 


the sum being extended over the distinct permutations ø of degree n. We 
can formally summarize as follows. 


Theorem 2. Let K be a commutative ring with identity and let n be a 
positive integer. There is precisely one determinant function on the set of 
n X n matrices over K, and it is the function det defined by (5-15). If D is 
any alternating n-linear function on K»*, then for each n X n matrix A 


D(A) = (det A)D(I). 


This is the theorem we have been seeking, but we have left a gap in 
the proof. That gap is the proof that for a given permutation c, when we 
pass from (1, 2,...,n) to (ol, ¢2,..., on) by interchanging pairs, the 
number of interchanges is always even or always odd. This basic com- 
binatorial fact can be proved without any reference to determinants; 
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however, we should like to point out how it follows from the existence of 
a determinant function on n X n matrices. 
Let us take K to be the ring of integers. Let D be a determinant 


function on n X n matrices over K. Let o be a permutation of degree n, , 


and suppose we pass from (1, 2,...,7) to (cl, o2, ..., on) by m inter- 
changes of pairs (îi, j), 7 = 7. As we showed in (5-13) 

CALF = D (en, aara y éon) 
that is, the number (—1)” must be the value of D on the matrix with 
LOWS €o1, +» - y Eon. If 

Dlleas... Em) = 1, 

then m must be even. If 

D(éa;. . +5 €n) = ~], 
then m must be odd. 

Since we have an explicit formula for the determinant of ann X n 
matrix and this formula involves the permutations of degree n, let us 
conclude this section by making a few more observations about permu- 
tations. First, let us note that there are precisely n! = 1 - 2 --- n permu- 
tations of degree n. For, if e is such a permutation, there are n possible 
choices for øl; when this choice has been made, there are (n — 1) choices 
for 2, then (n — 2) choices for 3, and so on. So there are 


nn —1)\(n— 2) ---2-l=n! 


permutations ø. The formula (5-15) for det (A) thus gives det (A) as a 
sum of n! terms, one for each permutation of degree n. A given term is a 
product 
A(l, cl) --- A(n, on) 

of n entries of A, one entry from each row and one from each column, 
and is prefixed by a ‘+’ or ‘—’ sign according as ø is an even or odd 
permutation. 

When permutations are regarded as one-one functions from the set 
{1, 2,...,m} onto itself, one can define a product of permutations. The 
product of o and r will simply be the composed function or defined by 


(or)(t) = o(r(2)). 


If e denotes the identity permutation, eli) = 7, then each ø has an inverse 


o~ such that 


-1 — 1 


og = goo = 6 


One can summarize these observations by saying that, under the opera- 
tion of composition, the set of permutations of degree n is a group. This 
group is usually called the symmetric group of degree n. 
From the point of view of products of permutations, the basic prop- 
“erty of the sign of a permutation is that 


(5-16) sgn (or) = (sgn o)(sgn 7). 
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In other words, ør is an even permutation if ø and 7 are either both even 
or both odd, while ør is odd if one of the two permutations is odd and the 
other is even. One can see this from the definition of the sign in terms of 
successive interchanges of pairs (i, j). It may also be instructive if we 
point out how sgn (er) = (sgn e)(sgn r) follows from a fundamental 
property of determinants. 

Let K be the ring of integers and let o and r be permutations of 
degree n. Let «e, . . . , €, be the rows of the n X n identity matrix over K, 
let A be the matrix with rows em ..., €n and let B be the matrix with 
TOWS €o1,. ++, €n. Lhe 2th row of A contains exactly one non-zero entry, 
namely the 1 in column rt. From this it is easy to see that esri is the ith 
row of the product matrix AB. Now 


det (A) = sgnv, det (B) = sgn ø, and det (AB) = sgn (or). 
So we shall have sgn (er) = (sgno)(sgnr) as soon as we prove the 


following. 


Theorem 3. Let K be a commutative ring with identity, and let A and 
B be n X n matrices over K. Then 


det (AB) = (det A) (det B). 


Proof. Let B be a fixed n X n matrix over K, and for each n X n 
matrix A define D(A) = det(AB). If we denote the rows of A by a,..-, 
Qn, then 
D(oy,...,@n) = det (a1B,..., anB). 


Here a;B denotes the 1 X n matrix which is the product of the 1 X n 
matrix a; and the n X n matrix B. Since 


(ca; + a)B = ca,B + œB 


and det is n-linear, it is easy to see that D is n-linear. If æ; = æ; then 
aiB = a;B, and since det is alternating, 


D(a,...,@n) = 0. 


Hence, D is alternating. Now D is an alternating n-linear function, and 
by Theorem 2 


D(A) = (det A)D(). 
But D(I) = det (IB) = det B, so 
det (AB) = D(A) = (det A)(det B). f 
The fact that sgn (cr) = (sgn oc) (sgn r) is only one of many corollaries 


to Theorem 3. We shall consider some of these corollaries in the next 
section. 
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Exercises 


1. If K isa commutative ring with identity and A is the matrix over K given by 
O ab 
A=|-a Oc 
—b -c 0 


2. Prove that the determinant of the Vandermonde matrix 


1a a 
Í b | 
l c e 
is (b — a)(c — a) (c — b). 


3. List explicitly the six permutations of degree 3, state which are odd and which 
are even, and use this to give the complete formula (5-15) for the determinant of a 
3 X 3 matrix. 


show that det A = 0. 


4. Let o and 7 be the permutations of degree 4 defined by ol = 2, o2 = 3, 
03 = 4,04 = 1,71 = 3,72 = 1,73 = 2,74 = 4. 
(a) Is e odd or even? Is 7 odd or even? 
(b) Find ør and ro. 


5. If A is an invertible n X n matrix over a field, show that det A ¥ 0. 
6. Let A be a 2 X 2 matrix over a field. Prove that det (J + A) = 1 + det A 
if and only if trace (A) = 0. 


7. An n Xn matrix A is called triangular if A;; = 0 whenever i > j or if 
Ay; = 0 whenever i < j. Prove that the determinant of a triangular matrix is the 
product AyA_ +++ Ann of its diagonal entries. 

8. Let A be a 3 X 3 matrix over the field of complex numbers. We form the 
matrix zJ — A with polynomial entries, the 7, 7 entry of this matrix being the 
polynomial 6,;2 — Aj; If f = det (xI — A), show that f is a monic polynomial 
of degree 3. If we write 

J = (x — G1) (% — c) (£ — c) 
with complex numbers ci, ¢2, and ¢, prove that 
Ci + & + c = trace (A) and cce = det A. 


9. Let n be a positive integer and F a field. If æ is a permutation of degree n, 
prove that the function 


T(a1,. 2.5 En) = (Zoi ++ +) Lan) 
is an invertible linear operator on F*. 


10. Let F be a field, n a positive integer, and S the set of n X n matrices over F. 
Let V be the vector space of all functions from S into F. Let W be the set of alter- 
nating n-linear functions on S. Prove that W is a subspace of V. What is the dimen- 
sion of W? 
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11. Let T be a linear operator on F”. Define 
Dr(ay,..., Qn) = det (Ta . . . , Tan). 
(a) Show that Dr is an alternating n-linear function. 


(b) If 
c = det (Ten, . . . , Ten) 
show that for any n vectors ay, .. ., a, we have 
det (Tai, . . ., Tan) = c det (a1,..., Qn). 

(c) If @ is any ordered basis for F” and A is the matrix of T in the ordered 
basis ®, show that det A = c. 

(d) What do you think is a reasonable name for the scalar c? 
12. If o is a permutation of degree n and A is an n X n matrix over the field F 
with row vectors Qi, ..., Qn, let ¢(A) denote the n X n matrix with row vectors 
Qor, wa ey Qon» 

(a) Prove that ¢(AB) = o(A)B, and in particular that o(A) = o(/)A. 

(b) If T is the linear operator of Exercise 9, prove that the matrix of T in 
the standard ordered basis is o(J). 

(c) Is o~(I) the inverse matrix of o(I)? 

(d) Is it true that o(A) is similar to A? 
13. Prove that the sign function on permutations is unique in the following sense. 
If f is any function which assigns to each permutation of degree n an integer, and 
if f(or) = f(c) f(r), then f is identically 0, or f is identically 1, or f is the sign 
function. 


5.4. Additional Properties of Determinants 


In this section we shall relate some of the useful properties of the 
determinant function on n X n matrices. Perhaps the first thing we should 
point out is the following. In our discussion of det A, the rows of A have 
played a privileged role. Since there is no fundamental difference between 
rows and columns, one might very well expect that det A is an alternating 
n-linear function of the columns of A. This is the case, and to prove it, 
it suffices to show that 


(5-17) det (A‘) = det (A) 


where A‘ denotes the transpose of A. 
If cisa permutation of degree n, 


A'(i, ot) = A (si, ò). 
From the expression (5-15) one then has 
det (4+) = E (sgno)A(ol, 1) +- Alen, n). 


When 7 = oj, A (ci, i) = A (J, oy). Thus 
A(ol, 1) --- Alon, n) = A(1, mi) +++ A(n, on). 
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Since oo is the identity permutation, 

(sgn o)(sgno™) = 1 or sgn (o—) = sgn (øo). 
Furthermore, as ¢ varies over all permutations of degree n, so does o—!. 
Therefore 

det (A) = Ð (sgno)A(1, o 41) --- Aln, an) 


= det A 


proving (5-17). 

On certain occasions one needs to compute specific determinants. 
When this is necessary, it is frequently useful to take advantage of the 
following fact. If B is obtained from A by adding a multiple of one row of A 
to another (or a multiple of one column to another), then 


(5-18) det B = det A. 


We shall prove the statement about rows. Let B be obtained from A by 
adding ca; to a;, where 7 < j. Since det is linear as a function of the ¿th row 


det B = det A + cdet (a1,...,@j,..., Qj). - +) An) 
= det A. 


Another useful fact is the following. Suppose we haveann X n matrix 


of the block form 
0 C 


where A is anr X r matrix, C isan s X s matrix, B isr X s, and 0 denotes 
the s X r zero matrix. Then 


(5-19) det i al = (det A)(det ©). 


To prove this, define 


A B 
D(A, B, 0) = det 8 Al 


If we fix A and B, then D is alternating and s-linear as a function of the 
rows of C. Thus, by Theorem 2 


D(A, B, C) = (det C)D(A, B, I) 


where / is the s X s identity matrix. By subtracting multiples of the rows 
of J from the rows of B and using the statement above (5-18), we obtain 


D(A, B, I) = D(A, 0, I). 


Now D(A, 0, I) is clearly alternating and r-linear as a function of the rows 
of A. Thus 
D(A, 0, I) = (det A)D(I, 0, I). 
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But D(I, 0, I) = 1, so 

D(A, B,C) = (det C)D(A, B, I) 
(det C)D(A, 0, I) 
= (det C)(det A). 


By the same sort of argument, or by taking transposes 


(5-20) det F A = (det A)(det ©). 


EXAMPLE 6. Suppose K is the field of rational numbers and we wish 
to compute the determinant of the 4 X 4 matrix 


1 -l 2 3 


2 2 0 2 
al 1 -1 -I 
1 2 3 0 


By subtracting suitable multiples of row 1 from rows 2, 3, and 4, we 
obtain the matrix 


1 =l 2 3 
0 4 -4 —4 
0 5 —9 —13 
0 3 1 -3 


which we know by (5-18) will have the same determinant as A. If we 
subtract $ of row 2 from row 3 and then subtract 3 of row 2 from row 4, 
we obtain 


1 —l1 2 3 
0 4 ~—4 —4 
ia 0 0 -4 -8 
0 0 4 0 
and again det B = det A. The block form of B tells us that 
1 -1||-4 -8 
dut A = det B = $ al 4 o| = 4(32) = 128. 





Now let n > 1 and let A be an n X n matrix over K. In Theorem 1, 
we showed how to construct a determinant function on n X n matrices, 
given one on (n — 1) X (n — 1) matrices. Now that we have proved the 
uniqueness of the determinant function, the formula (5-4) tells us the 
following. If we fix any column index J, 


det A = 3 (—1)™A; det A (ilj). 
i=1 


The scalar (—1)i+? det A (ilj) is usually called the îi, 7 cofactor of A or 
the cofactor of the i, j entry of A. The above formula for det A is then 
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called the expansion of det A by cofactors of the jth column (or sometimes 
the expansion by minors of the jth column). If we set 

Ci; = (—1)**7 det A (aly) 
then the above formula says that for each j 


det A = 5 AC 
t=] 


where the cofactor C; is (—1)i+ times the determinant of the (n — 1) X 
(n — 1) matrix obtained by deleting the 7th row and jth column of A. 
If 7 = k, then 


> Auli; = 0. 
i=1 
For, replace the jth column of A by its kth column, and call the resulting 


matrix B. Then B has two equal columns and so det B = 0. Since B(i|j) = 


A(i|j), we have 
0 = det B 


(—1)*"B,; det Balj) 


e. 
= 


li 
Ms 


È (—1)*iA det A (ilj) 
i=1 


E Arli. 
i=l 
These properties of the cofactors can be summarized by 
(5-21) > AuCi; = 5% det A. 
is 


The n X n matrix adj A, which is the transpose of the matrix of co- 
factors of A, is called the classical adjoint of A. Thus 


(5-22) (adj A) = Cn = (—1) "7 det A (jl). 
The formulas (5-21) can be summarized in the matrix equation 
(5-23) (adj A)A = (det ADI. 


We wish to see that A (adj A) = (det A)J also. Since A‘(|7) = A(j|z)4, 
we have 
(—1)i+ det A “(alj) = (—1)*+* det A (jli) 
which simply says that the 7, 7 cofactor of A‘ is the J, 7 cofactor of A. Thus 
(5-24) adj (A‘) = (adj A)! 
By applying (5-23) to A‘, we obtain 
(adj AĴA! = (det AI = (det A) 


and transposing 
A(adj A‘) = (det A). 
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Using (5-24), we have what we want: 
(5-25) A(adj A) = (det A)I. 


As for matrices over a field, an n X n matrix A over K is called 
invertible over K if there is an n X n matrix A~! with entries in K 
such that AAT! = AA = J, If such an inverse matrix exists it is unique; 
for the same argument used in Chapter 1 shows that when BA = AC =I 
we have B = C. The formulas (5-23) and (5-25) tell us the following about 
invertibility of matrices over K. If the element det A has a multiplicative 
inverse in K, then A is invertible and A~! = (det A)~1 adj A is the unique 
inverse of A. Conversely, it is easy to see that if A is invertible over K, 
the element det A is invertible in K. For, if BA = I we have 


1 = det J = det (AB) = (det A) (det B). 
What we have proved is the following. 


Theorem 4. Let A be an n X n matrix over K. Then A is invertible 
over K af and only if det A ts invertible in K. When A ts invertible, the unique 
inverse for A ts 

Av! = (det A)-!adj A. 


In particular, an n X n matrix over a field is invertible if and only if its 
determinant ts different from zero. 


We should point out that this determinant criterion for invertibility 
proves that an n X n matrix with either a left or right inverse is invertible. 
This proof is completely independent of the proof which we gave in Chap- 
ter 1 for matrices over a field. We should also like to point out what in- 
vertibility means for matrices with polynomial entries. If K is the poly- 
nomial ring F[z], the only elements of K which are invertible are the 
non-zero scalar polynomials. For if f and g are polynomials and fg = 1, 
we have deg f + deg g = 0 so that deg f = deg g = 0, i.e., f and g are 
scalar polynomials, So an n X n matrix over the polynomial ring F[z] is 
invertible over F [xz] if and only if its determinant is a non-zero scalar 
polynomial. 


ExamPLe 7. Let K = R[x], the ring of polynomials over the field of 
real numbers. Let 


_fete «41 = zr? — 1 a+ 2] 
Dhe E Paa oA el 


Then, by a short computation, det A = z + 1 and det B = —6. Thus A 
is not invertible over K, whereas B is invertible over K. Note that 


; 1 —xr — 1 o x =g =R 
mjaf syi l AR| aga ee 
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and (adj A)A = (x + 1)/, (adj B)B = —6I. Of course, 
Bo = -l x —r — ’} 
6L—r? +2r—-3 1-7? 
EXxampLe 8. Let K be the ring of integers and 
1 2 
a=[3 af 


í 4 -2 
atid =[_3 ah 


Thus A is not invertible as a matrix over the ring of integers; however, 
we can also regard A as a matrix over the field of rational numbers. If we 
do, then A is invertible and 


1 4 —2 —2 1 
aaa Gls il 


In connection with invertible matrices, we should like to mention one 
further elementary fact. Similar matrices have the same determinant, 
that is, if P is invertible over K and B = P~AP, then det B = det A. 
This is clear since 


det (P-1AP) = (det P-)(det A)(det P) = det A. 


This simple observation makes it possible to define the determinant of 
a linear operator on a finite dimensional vector space. If T is a linear 
operator on V, we define the determinant of T to be the determinant of 
any n X n matrix which represents T in an ordered basis for V. Since all 
such matrices are similar, they have the same determinant and our defini- 
tion makes sense. In this connection, see Exercise 11 of section 5.3. 

We should like now to discuss Cramer’s rule for solving systems of 
linear equations. Suppose A is an n X n matrix over the field F and we 
wish to solve the system of linear equations AX = Y for some given 
n-tuple (y1,... 5 Yn). If AX = Y, then 


(adj A)AX = (adj A)Y 


Then det A = —2 and 


and so 
(det A)X = (adj A)Y. 
Thus 


(det A)z; = 2 (adj A) sy: 


2 (—1)i+iy; det A (ilj). 


This last expression is the determinant of the n X n matrix obtained by 
replacing the jth column of A by Y. If det A = 0, all this tells us nothing; 
however, if det A #0, we have what is known as Cramer’s rule. Let A 
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be an n X n matrix over the field F such that det A ~ 0. If y1,..., Yn 
are any scalars in F, the unique solution X = A-'Y of the system of 
equations AX = Y is given by 
TRER det B; Pwi 
i= det A” J= de. ey n 


where B; is the n X n matrix obtained from A by replacing the jth column 
of A by Y. 

In concluding this chapter, we should like to make some comments 
which serve to place determinants in what we believe to be the proper 
perspective. From time to time it is necessary to compute specific deter- 
minants, and this section has been partially devoted to techniques which 
will facilitate such work. However, the principal role of determinants in 
this book is theoretical. There is no disputing the beauty of facts such as 
Cramer’s rule. But Cramer’s rule is an inefficient tool for solving systems 
of linear equations, chiefly because it involves too many computations. 
So one should concentrate on what Cramer’s rule says, rather than on 
how to compute with it. Indeed, while reflecting on this entire chapter, 
we hope that the reader will place more emphasis on understanding what 
the determinant function is and how it behaves than on how to compute 
determinants of specific matrices. 





Exercises 


1. Use the classical adjoint formula to compute the inverses of each of the fol- 
lowing 3 X 3 real matrices. 


—2 3 2 cos6 0 —sin 0 
6 0 3) 0 1 0 
4 1 -l sinô 0 cos 0. 


2. Use Cramer’s rule to solve each of the following systems of linear equations 
over the field of rational numbers. 
(a) t+ y+ ¢=11 
2r — by — z= 0 
3z + 4y + 22 = 0. 


(b) 3a — 2 = 7 
3y — 2 = 6 
3z — 2x = —1. 

3. Ann X n matrix A over a field F is skew-symmetric if At = — A. If A isa 


skew-symmetric n X n matrix with complex entries and n is odd, prove that 
det A = 0. 


4. An n X n matrix A over a field F is called orthogonal if AA‘ = J. If A is 
orthogonal, show that det A = +1. Give an example of an orthogonal matrix 
for which det A = —1. 
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5. An n X n matrix A over the field of complex numbers is said to be unitary 
if AA* = I (A* denotes the conjugate transpose of A). If A is unitary, show 
that |det A| = 1. 


6. Let T and U be linear operators on the finite dimensional vector space V. Prove 
(a) det (TU) = (det T)(det U); 
(b) T is invertible if and only if det T # 0. 


7. Let A be ann X n matrix over K, a commutative ring with identity. Suppose 
A has the block form 


A 0 0 
ga? 4s 
0 0 Ar 


where A; is an r; X r; matrix. Prove 
det A = (det A:)(det A2) -+- (det Ax). 


8. Let V be the vector space of n X n matrices over the field F. Let B be a fixed 
element of V and let Tz be the linear operator on V defined by T3(A) = AB — BA. 
Show that det Ts = 0. 


9. Let A be an n X n matrix over a field, A = 0. If r is any positive integer 
between 1 and n, anr X r submatrix of A isany7 X r matrix obtained by deleting 
(n — r) rows and (n — r) columns of A. The determinant rank of A is the 
largest positive integer r such that some r X 7 submatrix of A has a non-zero 
determinant. Prove that the determinant rank of A is equal to the row rank of 
A (= column rank A). 


10. Let A be an n X n matrix over the field F. Prove that there are at most n 
distinct scalars c in F such that det (cf — A) = 0. 


1l. Let A and B be n X n matrices over the field F. Show that if A is invertible 
there are at most n scalars c in F for which the matrix cA + B is not invertible. 


12. If V is the vector space of n X n matrices over F and B is a fixed n X n matrix 
over F, let Lg and Rpg be the linear operators on V defined by Ls(A) = BA and 
R3(A) = AB. Show that 

(a) det Ls = (det B)"; 

(b) det Rg = (det B)”. 


13. Let V be the vector space of all n X n matrices over the field of complex 
numbers, and let B be a fixed n X n matrix over C. Define a linear operator M s 
on V by M2(A) = BAB*, where B* = Bt. Show that 


det Ms = |det B|”. 


Now let H be the set of all Hermitian matrices in V, A being Hermitian if 
A = A*, Then H is a vector space over the field of real numbers. Show that the 
function Tg defined by T3(A) = BAB* is a linear operator on the real vector 
space H, and then show that det Ts = |det B|". (Hint: In computing det Ts, 
show that V has a basis consisting of Hermitian matrices and then show that 
det Ts = det M B.) 
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14. Let A, B, C, D be commuting n X n matrices over the field F. Show that the 
determinant of the 2n X 2n matrix 

A a 

C D 


is det (AD — BC). 


5.5. Modules 


If K is a commutative ring with identity, a module over K is an alge- 
braic system which behaves like a vector space, with K playing the role 
of the scalar field. To be precise, we say that V is a module over K (or a 
K-module) if 


1. there is an addition (a, 8) a+ B on V, under which V is a 
commutative group; 
2. there is a multiplication (c, a) > ca of elements a in V and cin K 
such that 
(c + Co)a = cia + Coa 
clar + aa) = Cay + ca 
(&icz)a = c(cza) 
la =a. 


For us, the most important K-modules will be the n-tuple modules K”. 
The matrix modules K™*= will also be important. If V is any module, we 
speak of linear combinations, linear dependence and linear independence, 
just as we do in a vector space. We must be careful not to apply to V any 
vector space results which depend upon division by non-zero scalars, the 
one field operation which may be lacking in the ring K. For example, if 
Qi... ak are linearly dependent, we cannot conclude that some a; is a 
linear combination of the others. This makes it more difficult to find bases 
in modules. 

A basis for the module V is a linearly independent subset which 
spans (or generates) the module. This is the same definition which we gave 
for vector spaces; and, the important property of a basis @ is that each 
element of V can be expressed uniquely as a linear combination of (some 
finite number of) elements of @. If one admits into mathematics the Axiom 
of Choice (see Appendix), it can be shown that every vector space has a 
basis. The reader is well aware that a basis exists in any vector space 
which is spanned by a finite number of vectors. But this is not the case 
for modules. Therefore we need special names for modules which have 
bases and for modules which are spanned by finite numbers of elements. 


Definition. The K-module V is called a free module if it has a basis. 
If V has a finite basis containing n elements, then V is called a free K-module 
with n generators. 
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Definition. The module V is finitely generated 2f t contains a finite 
subset which spans V. The rank of a finitely generated module is the smallest 
integer k such that some k elements span V. 


We repeat that a module may be finitely generated without having 
a finite basis. If V is a free K-module with n generators, then V is isomor- 
phic to the module K”. If {@;,...,6,} is a basis for V, there is an iso- 
morphism which sends the vector bı + > + ¢,8, onto the n-tuple 
(c),...) Cna) in K”. It is not immediately apparent that the same module V 
could not also be a free module on k generators, with k ¥ n. In other 
words, it is not obvious that any two bases for V must contain the same 
number of elements. The proof of that fact is an interesting application 
of determinants. 


Theorem 5. Let K be a commutative ring with identity. If V is a free 
K-module with n generators, then the rank of V is n. 


Proof. We are to prove that V cannot be spanned by less than 
n of its elements. Since V is isomorphic to K”, we must show that, if 
m < n, the module K” is not spanned by n-tuples ai,..., am. Let A be 
the matrix with rows aq, ..., @m. Suppose that each of the standard basis 
vectors e, ..., én is a linear combination of ay, . . ., @m. Then there exists 
a matrix P in K™™ such that 

PA =I 

where J is the n X n identity matrix. Let A be the n X n matrix obtained 
by adjoining n — m rows of 0’s to the bottom of A, and let P be any n X n 
matrix which has the columns of P as its first n columns. Then 


PA =I. 
Therefore det A ~ 0. But, since m < n, at least one row of A has all 0 
entries. This contradiction shows that a1, . . . , a, do not span K”. J 


It is interesting to note that Theorem 5 establishes the uniqueness 
of the dimension of a (finite-dimensional) vector space. The proof, based 
upon the existence of the determinant function, is quite different from the 
proof we gave in Chapter 2. From Theorem 5 we know that ‘free module 
of rank nw is the same as ‘free module with n generators.’ 

If V is a module over K, the dual module V* consists of all linear 
functions f from V into K. If V is a free module of rank n, then V* is also 
a free module of rank n. The proof is just the same as for vector spaces. 


If {81 . . . , Bn} is an ordered basis for V, there is an associated dual basis 
{fu - - - fn} for the module V*. The function f; assigns to each a in V its 
ith coordinate relative to {@1,..., Bn}: 


a = fila)Br + +--+ + frla)Bn. 


If f is a linear function on V, then 


f = f(@vhi apse + f(Bn) fr 
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5.6. Multilinear Functions 


The purpose of this section is to place our discussion of determinants 
in what we believe to be the proper perspective. We shall treat alternating 
multilinear forms on modules. These forms are the natural generalization 
of determinants as we presented them. The reader who has not read (or 
does not wish to read) the brief account of modules in Section 5.5 can still 
study this section profitably by consistently reading ‘vector space over F 
of dimension n’ for ‘free module over K of rank n.’ 

Let K be a commutative ring with identity and let V be a module 
over K. If ris a positive integer, a function Lfrom V' =VXVX-:: XV 
into K is called multilinear if L(a,...,«,) is linear as a function of 
each a; when the other a,’s are held fixed, that is, if for each i 
Llas... , cai + Bi...) 0) = CL 6. Aine Oe F 

L(ay, . . by Bays ny Qr). 
A multilinear function on V” will also be called an r-linear form on V 
or a multilinear form of degree r on V. Such functions are sometimes 
called r-tensors on V. The collection of all multilinear functions on 
V” will be denoted by M7(V). If L and M are in M*(V), then the sum 
L+M: 

(L+ M)(a,...,a,) = L(ay,..., ar) + M(ay,..., a) 
is also multilinear; and, if c is an element of K, the product cL: 
(cL)(a1,...,a,) = cL(ay,... , a) 

is multilinear. Therefore M7(V) is a K-module—a submodule of the 
module of all functions from V” into K. 

If r= 1 we have M'(V) = V*, the dual module of linear functions 
on V. Linear functions can also be used to construct examples of multi- 
linear forms of higher order. If fi, . . . , f+ are linear functions on V, define 


Lla, e...) ar) = fila) fela) toes f(a). 
Clearly L is an r-linear form on V. 


ExamPLE 9. If V is a module, a 2-linear form on V is usually called a 
bilinear form on V. Let A be an n X n matrix with entries in K. Then 


L(X, Y) = Y'AX 
defines a bilinear form L on the module K”™*!, Similarly, 
M (a, 8) = aA 


defines a bilinear form M on K”. 


EXAMPLE 10. The determinant function associates with each n X n 
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matrix A an element det A in K. If det A is considered as a function of 
the rows of A: 
det A = D(a... , ap) 


then D is an n-linear form on K”. 


ExamMpie 11. It is easy to obtain an algebraic expression for the 
general r-linear form on the module K”. If a,...,a, are vectors in V 
and A is the r X n matrix with rows a, ...,a,, then for any function L 
in M"(K”), 


n 
L(a, sey ar) = 1 D Ai Bays sey a) 
j= 
n 
= D Ai;L(éi, az... » Or) 
j=l 
n n 
= 2 AijL Ej; 2 Acres, e.. ar) 
j=l j=l 
n n 
=> 2 AyjAnL(é;, €ky AZ, se ey ar) 
j=l k=l 
n 
= 2 AALE; Eky A3z esy ar). 
jk=1 
If we replace as, . . . , ayin turn by their expressions as linear combinations 


of the standard basis vectors, and if we write A(t, j) for A:;, we obtain the 
following: 


(5-26) Ling...,a)= È ACL ji) +++ A FDL Gin): 


Jl veces Jr= 
In (5-26), there is one term for each r-tuple J = (j:,..., j+) of positive 
integers between 1 and n. There are n” such r-tuples. Thus L is completely 
determined by (5-26) and the particular values: 


cr = Ley... &) 
assigned to the n” elements (¢;,,.. . , &,). It is also easy to see that if for 
each 7-tuple J we choose an element cy of K then 
(5-27) L(a, e.) ay) = ZA n) ore A(r, J+)cs 


defines an r-linear form on K”. 


Suppose that L is a multilinear function on V” and M is a multilinear 
function on V*. We define a function L ® M on V+ by 
(5-28) (LQ Mlan... 5 Ores) = Llas o. 3 Ar) M (pgs, oy Orge)- 
If we think of V’+* as V” X V°, then for a in V” and £ in V° 


(L ® M) (a, 6) = L(a)M (B). 
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It is clear that L ® M is multilinear on V7t*. The function L ® M is 
called the tensor product of L and M. The tensor product is not com- 
mutative. In fact, M® LIL # L®M unless L = 0 or M = 0; however, 
the tensor product does relate nicely to the module operations in M” 
and M°. 


Lemma. Let L, Lı be r-linear forms on V, let M, M; be s-linear forms 
on V and let c be an element of K. 

(a) (CL +li)@M =c(L®M)+1L,@M; 

(b) L® (cM + M) = (L@M)+L@M.. 


Proof. Exercise. 


Tensoring is associative, i.e., if L, M and N are (respectively) r-, s- 
and t-linear forms on V, then 


(L®M)O®N=LO(MOQN). 
This is immediate from the fact that the multiplication in K is associative. 
Therefore, if Lı, Le,..., Le are multilinear functions on V",..., V7, 
then the tensor product 
L = Ty ® Soest: ® L; 
is unambiguously defined as a multilinear function on V’, where r = 


7 + +++ + re. We mentioned a particular case of this earlier. If fu... fr 
are linear functions on V, then the tensor product 


L=f@®-:- Of 
Loa, e. ar) = fila) mee J-a). 


Theorem 6. Let K be a commutative ring with identity. If V is a free 
K-module of rank n then M*(V) is a free K-module of rank n"; in fact, if 
{f1,..., fa} ts a basis for the dual module V*, the n" tensor products 

O Ofn l<ji<n,...,1<}<n 
form a basis for M'(V). 
Proof. Let {f:,... fn} be an ordered basis for V* which is dual 
to the basis {bı, . . . , Bn} for V. For each vector «œ in V we have 
a= fila)Bi + ee + fr(a)Bn 
We now make the calculation carried out in Example 11. If L is an r-linear 
form on V and a,..., œ, are elements of V, then by (5-26) 


Llan.. ., œ) = D Salar) +++ Siler) L(Biy +. +, Bir)- 


Deres 


is given by 


In other words, 


(5-29) L= > L(Biny +++ 1 Biha @ +> O fie 


Fissa 


Sec, 5.6 Multilinear Functions 


This shows that the n” tensor products 
(5-30) Es = f o OF; 
given by the r-tuples J = (j;,...,j,) span the module M*(V). We see 


that the various r-forms /,; are independent, as follows. Suppose that for 
each J we have an element cy in K and we form the multilinear function 


(5-31) L = z esis. 
Notice that if J = (4,..., ù), then 
0, IJ 
E. ee eres 
JCB. Bi) O eee 


Therefore we see from (5-31) that 
(5-32) cr = LBi- . , Bi). 
In particular, if L = 0 then cz = 0 for each r-tuple Z. J 


Definition. Let L be an r-linear form on a K-module V. We say that L 
is alternating if L(a,..., &r) = 0 whenever ai = aj with i # j. 


If L is an alternating multilinear function on V”, then 
Llan... An.. A) = Llan, Ajaa Oi y Oy) 


In other words, if we transpose two of the vectors (with different indices) 
in the r-tuple (œ, ...,a&) the associated value of L changes sign. Since 
every permutation ø is a product of transpositions, we see that Llæsa,..., 
Qer) = (sgn o) Llan ..., Os) 

We denote by A’(V) the collection of all alternating r-linear forms 
on V. It should be clear that A’(V) is a submodule of M'(V). 


Examp_eE 12, Earlier in this chapter, we showed that on the module 
K” there is precisely one alternating n-linear form D with the property 
that D(a,..., €n) = 1. We also showed in Theorem 2 that if L is any 
form in A*(K*) then 
L = Lla, ee ey én) D. 


In other words, A"(K*) is a free K-module of rank 1. We also developed 
an explicit formula (5-15) for D. In terms of the notation we are now 
using, that formula may be written 


(5-33) D = È (sgn a) fa ® +++ O fon 
where fi, . . . , fn are the standard coordinate functions on K” and the sum 
is extended over the n! different permutations ø of the set {1,..., n}. 


If we write the determinant of a matrix A as 
det A = Ð (sgn o) A (el, 1) --- Alon, n) 
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then we obtain a different expression for D: 
(5-34) Dlan.. On) = D (sgn a) filaor) ° ++ falcon) 


= J (sgn o) Lla... ; Qon) 
where L = fiO Ofr 


There is a general method for associating an alternating form with 
a multilinear form. If L is an r-linear form on a module V and if ø is a 
permutation of {1,...,r}, we obtain another r-linear function L, by 
defining 
Llas . . ., Gr) = L(ae,..., der). 


If L happens to be alternating, then Le = (sgna)L. Now, for each L in 
M(V) we define a function r,L in M*(V) by 


(5-35) wL = È (sgn o)Le 
that is, 
(5-36) (wrL)(a1,..., ar) = D (sgn o) Liao, ..., der). 


Lemma. r, is a linear transformation from M*(V) into At(V). If L 
is in A'(V) then a,L = r!L. 


Proof. Let r be any permutation of {1,..., 7}. Then 
(mL) (art). + Or) = D (sgn o) Lara, . . . 5 Qrar) 


= (sgn T) 2 (sgn 70) L(Qre1; Sey Qrar). 
As g runs (once) over all permutations of {1, . . . , r}, so does rø. Therefore, 
(aL) (Qn, . . -, er) = (sgn 7) (aL) (a1,..., æ). 
Thus 7, is an alternating form. 
If L is in At(V), then L(aa,..., er) = (sgn o) L(m,..., œ) for 


each g; hence 7,L = r!L. J 


In (5-33) we showed that the determinant function D in A"(K”%) is 
D = mihi ® +++ O Sa) 


where f,,...,fn are the standard coordinate functions on K”. There is 
an important remark we should make in connection with the last lemma. 
If K is a field of characteristic zero, such that r! is invertible in K, then 
m maps M*(V) onto A’(V). In fact, in that case it is more natural from one 
point of view to use the map m, = (1/r!)r rather than r, because rı is a 
projection of M*(V) onto A‘(V), i.e., a linear map of M*(V) onto A’(V) 
such that m(Z) = L if and only if L is in A’(V). 
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Theorem 7. Let K be a commutative ring with identity and let V be 
a free K-module of rank n. If r > n, then A'(V) = {0}. Jf 1 <r <n, then 


A'(V) is a free K-module of rank 6) 


Proof. Let {61,...,8n} be an ordered basis for V with dual 
basis {fı . . ., fn}. If L isin M7(V), we have 


(5-37) L= 2 LlBis +++ Bi) fi @ ++ OF, 


where the sum extends over all r-tuples J = (ji,...,J,) of integers be- 
tween 1 and n. If L is alternating, then 


L( Bin sey Bj.) =0 

whenever two of the subscripts 7; are the same. If r > n, then in each 
r-tuple J some integer must be repeated. Thus A’(V) = {0} ifr > n. 

Now suppose 1 < r < n. If L is in A*(V), the sum in (5-37) need be 
extended only over the r-tuples J for which j1, . . . , Jz are distinct, because 
all other terms are 0. Each r-tuple of distinct integers between 1 and n is 
a permutation of an r-tuple J = (j1,...,j,) such that ji < +--+ < jr 
This special type of r-tuple is called an 7-shuffle of {1,..., n}. There are 


W D n! 
r) r(n-—r)! 
such shuffles. 


Suppose we fix an r-shuffle J. Let Ly be the sum of all the terms in 
(5-37) corresponding to permutations of the shuffle J. If ¢ is a permutation 
of {1,...,r}, then 


L (Bias sey Bier) = (sgn a) L(Bi, honey 6;,). 


Thus 

(5-38) Ly = LB... Bj) DI 
where 

(5-39) D; = z (sgn o) fin O +** O Sin 


= (fi ® +++ @ fir)- 
We see from (5-39) that each Dy is alternating and that 


(5-40) L= © L(Bj,...,6i)Ds 
shuffles J 


for every L in A’(V). The assertion is that the (*) forms Dy constitute a 
basis for A"(V). We have seen that they span A’(V). It is easy to see that 


they are independent, as follows. If I = (i1,...,%,) and J = (Ju. - -s Jr) 
are shuffles, then 

1, l=J 
(5-41) Ds(Bay sey Bi) = : 


0, l#J 
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Suppose we have a scalar cy for each shuffle and we define 
L= 2 cyDz. 
J 


From (5-40) and (5-41) we obtain 


a= L (Bis e.) Bai). 
In particular, if L = 0 then cr = 0 for each shuffle Z. J 


Corollary. If V is a free K-module of rank n, then A”(V) is a free 
K-module of rank 1. If T is a linear operator on V, there is a unique element 
c in K such that 

L(Ta,..., Tan) = cL(a,..., an) 
Jor every alternating n-linear form L on V. 
Proof. If Lis in A”(V), then clearly 
Lrlan ..., an) = L(Ta, ..., Tan) 
defines an alternating n-linear form Lr. Let M be a generator for the rank 
1 module A”(V). Each L in A”(V) is uniquely expressible as L = aM for 
some ain K. In particular, Mr = cM for a certain c. For L = aM we have 
Lr = (aM)r 
aMr 
a(cM) 
c(aM) 
=cL. I 


Of course, the element c in the last corollary is called the determinant 
of T. From (5-39) for the case r = n (when there is only one shuffle 
J = (1,...,7)) we see that the determinant of T is the determinant of 
the matrix which represents T in any ordered basis {$n . . . , Bx}. Let us 
see why. The representing matrix has 7, J entry 


Ay = f,(TB:) 
so that 


D;(TB,,..., TBa) = È (sgn a) A (1, ol) --- A(n, on) 


= det A. 
On the other hand, 


Di(Thi, oy fas any TBn) = (det T) Dif, anes | Bn) 
= det T. 


The point of these remarks is that via Theorem 7 and its corollary we 
obtain a definition of the determinant of a linear operator which does not 
presume knowledge of determinants of matrices. Determinants of matrices 
can be defined in terms of determinants of operators instead of the other 
way around. 
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We want to say a bit more about the special alternating r-linear 
forms Dj, which we associated with a basis {f1,..., fn} for V* in (5-39). 
It is important to understand that D;(a:,...,a,) is the determinant of 
a certain r X r matrix. If 

Ai = fila), l1<i<7r1<j<n, 
that is, if 

a; = Aabi + +++ + AimBa,y l<ic<r 
and J is the r-shuffle (jy, . . . , Jr), then 


(5-42) Dylon... a) = z (sgn e) A (1, jor) +++ A(N, Jon) 
AL, j) AC, jr) 


A(r, ji) +++ A jh) 


Thus D;(a),...,a;) is the determinant of the r X r matrix formed from 
columns ji, . . . , J; of the r X n matrix which has (the coordinate n-tuples 
of) a,..., a, as its rows. Another notation which is sometimes used for 
this determinant is 


= det 


O(a,» ++) Qr), 
3 (Biv e.. » Bi) 


In this notation, the proof of Theorem 7 shows that every alternating 


(5-43) Dalen... Oy) = 


r-linear form L can be expressed relative to a basis {@;,...,8,} by the 
equation 
(5-44)  Llan..., a) = Chess) 


j L (Bis e. Bi). 
? Mr. 


aLe <ir albi Peni 


5.7. The Grassman Ring 


Many of the important properties of determinants and alternating 
multilinear forms are best described in terms of a multiplication operation 
on forms, called the exterior product. If L and M are, respectively, alter- 
nating r and s-linear forms on the module V, we have an associated product 
of L and M, the tensor product L ® M. This is not an alternating form 
unless L = 0 or M = 0; however, we have a natural way of projecting it 
into Att#(V). It appears that 


(5-45) L- M = m (L ® M) 


should be the ‘natural’ multiplication of alternating forms. But, is it? 
Let us take a specific example. Suppose that V is the module K” and 
fi,...,f, are the standard coordinate functions on K”. If i = j, then 


fi fi = (fi @ fi) 
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is the (determinant) function 
Di = fi Of -HOF: 
given by (5-39). Now suppose k is an index different from 7 and j. Then 
Dij fe = ALP OF —f Of) Of] 
(Si Of; © fe) — wo( fi © fi ® fr). 


The proof of the lemma following equation (5-36) shows that for any 
r-linear form L and any permutation ø of {1,..., r} 


(Le) = sgn o 7,(L) 
Hence, Dij: fe = 203(f; OF; ® fe). By a similar computation, f; © Dje = 
273(f: © f; ® fe). Thus we have 
(fi- fi + fe = fiee Si + tr) 
and all of this looks very promising. But there is a catch. Despite the 
computation that we have just completed, the putative multiplication in 


(5-45) is not associative. In fact, if l is an index different from 7, 7, k, then 
one can calculate that 


Di; > Du = 4ra(fi Dfi® f: ® fi) 
(Diy + fi) fi = Oral fi OF; @ fa © Hid. 


and that 


Thus, in general 


Gif: fe A ALK fd fel fi 


and we see that our first attempt to find a multiplication has produced a 
non-associative operation. 

The reader should not be surprised if he finds it rather tedious to give 
a direct verification of the two equations showing non-associativity. This 
is typical of the subject, and it is also typical that there is a general fact 
which considerably simplifies the work. 

Suppose L is an r-linear form and that M is an s-linear form on the 
module V. Then 


Trel (TL) ® (weM)) = mr+e(Z (sgn o)(sgn r)L, ®© M;) 

oy 2 (sgn a) (sgn T)Tr+a(Le ® M,) 
where o varies over the symmetric group, S,, of all permutations of 
{1,...,7r}, and 7 varies over S,. Each pair ø, r defines an element (ø, 7) 


of S+: which permutes the first r elements of {1,...,7 + s} according 
to ø and the last s elements according to 7. It is clear that 


sgn (ø, 7) = (sgn o)(sgn 7) 


(L® Men = Le ® Ly. 


and that 
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Therefore 
ras (T-L) ® (r,M)]) = z sgn (a, T) Trs KL ® M) (or). 


Now we have already observed that 
sgn (0, T), (L ® M)¢e.7)] Fi Trs (L ® M). 
Thus, it follows that 
(5-46) Trl (TL) ® (r:M)] = ris! tL ® M). 
This formula simplifies a number of computations. For example, suppose 
we have an r-shuffle I = (ù, ..., i) and s-shufle J = (71,...,7,). To 
make things simple, assume, in addition, that 
ÙL Li LJL e < ja 
Then we have the associated determinant functions 
Dy = 1,(E1) 
Dy = T,(Ez) 
where Ez and Ey are given by (5-30). Using (5-46), we see immediately that 
Dz + Dy = Tryst (Er) ® w(Es)] 
rislap4.(Er ® Ez). 
Since Er ® Ey = Eru, it follows that 
Di: Dy = ris! Dru. 


This suggests that the lack of associativity for the multiplication (5-45) 
results from the fact that Dr - Dy ¥ Dryy. After all, the product of Dr 
and Dy ought to be Drugs. To repair the situation, we should define a new 
product, the exterior product (or wedge product) of an alternating 
r-linear form L and an alternating s-linear form M by 


Il 


1 
(5-47) LAM = -ig lL Q M). 


We then have 
Dr A Dg = Diys 


for the determinant functions on K”, and, if there is any justice at all, we 
must have found the proper multiplication of alternating multilinear 
forms. Unfortunately, (5-47) fails to make sense for the most general case 
under consideration, since we may not be able to divide by rs! in the 
ring K. If K isa field of characteristic zero, then (5-47) is meaningful, and 
one can proceed quite rapidly to show that the wedge product is associative. 


Theorem 8. Let K be a field of characteristic zero and V a vector space 
over K. Then the exterior product is an associative operation on the alternating 
multilinear forms on V. In other words, if L, M, and N are alternating 
multilinear forms on V ef degrees r, 8, and t, respectively, then 


176 


Determinants Chap. 5 


(LAM)AN=LA (MAD). 
Proof. It follows from (5-47) that cd(L A M) = cL A dM for 
any scalars c and d. Hence 
risitt!((L A M) A NJ =r (LAM) A tIN 
and since r(N) = tN, it results that 
ristt![(L A M) A N] = Tta LOM) A iN) 
1 1 
(r+)! i! Try sq tL Tre ® M) ® w(N)]. 
From (5-46) we now see that 
rist[(L A M) AN] = Tal L O M ® N). 
By a similar computation 
ristt![L A (M A N)) = tas4(L @®M QN) 
and therefore, (LA M) AN=LA(MAN). J 
Now we return to the general case, in which it is only assumed that K 
is a commutative ring with identity. Our first problem is to replace (5-47) 
by an equivalent definition which works in general. If L and M are alter- 


nating multilinear forms of degrees r and s respectively, we shall construct 
a canonical alternating multilinear form L A M of degree r + s such that 


ris\(L A M) = ta L Q M). 
Let us recall how we define 7,4,(L ® M). With each permutation o 


of {1,...,r + s} we associate the multilinear function 
(5-48) (sgn o)(L ®Q M), 
where 


(L ® M)«(o4, sey Arts) = (L ® M) (aa, e.. Qa(r+s)) 


and we sum the functions (5-48) over all permutations ø. There are (r + $)! 
permutations; however, since L and M are alternating, many of the func- 
tions (5-48) are the same. In fact there are at most 


(r +s)! 


rls! 





distinct functions (5-48). Let us see why. Let S,+, be the set of permuta- 
tions of {1,...,r + s}, ie., let Sm}, be the symmetric group of degree 
r + s. As in the proof of (5-46), we distinguish the subset G that consists 
of the permutations ø which permute the sets {1,. .., r} and {r + 1,..., 
r + s} within themselves. In other words, ø is in G if 1 < ot < r for each 
i between 1 and r. (It necessarily follows that r+ 1 < øj < r + s for 
each j between r + 1 and r + s.) Now G is a subgroup of S,1,, that is, if 
o and 7 are in G then ør™! is in G. Evidently G has r!s! members. 
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We have a map 
Sts M#(V) 


Yle) = (sgn o)(L ® M)z. 
Since L and M are alternating, 
Wy) = LOM 


for every y in G. Therefore, since (No)r = Nre for any (r + s)-linear form 
N on V, we have 


defined by 


y(r) = y(r), Tin Sriss y in G. 
This says that the map y is constant on each (left) coset 7G of the sub- 
group G. If 7, and 72 are in S45, the cosets 7G and 7G are either identical 
or disjoint, according as 7z' 71 is in G or is not in G. Each coset contains 
rls! elements; hence, there are 


(r+ 8)! 


ris! 





distinct cosets. If S,,,/G denotes the collection of cosets then y defines 
a function on S,,,/G, i.e., by what we have shown, there is a function 
on that set so that 
Ur) = P(rG) 

for every r in S,,,. If H is a left coset of G, then ¥(H) = ¥(r) for every 
tin dd. 

We now define the exterior product of the alternating multilinear 
forms L and M of degrees r and s by setting 


(5-49) LAM = zu (H) 


where H varies over S,,,/G. Another way to phrase the definition of 
L A M isthe following. Let S be any set of permutations of {1,...,7 + s} 
which contains exactly one element from each left coset of G. Then 


(5-50) LAM = È (sgn o)(L ® M)e 


where o varies over S. Clearly 
rs!LA M = Tta L OM) 


so that the new definition is equivalent to (5-47) when K is a field of 
characteristic zero. 


Theorem 9. Let K be a commutative ring with identity and let V be 
a module over K. Then the exterior product is an associative operation on the 
alternating multilinear forms on V. In other words, if L, M, and N are 
alternating multilinear forms on V of degrees r, s, and t, respectively, then 


(LAM)AN=LA(MAN). 
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Proof. Although the proof of Theorem 8 does not apply here, 
it does suggest how to handle the general case. Let G(r, s, t) be the sub- 
group of S,+s+: that consists of the permutations which permute the sets 

{1,...,7}, f@ +1...,r+5}, rts. ‘rts ti 
within themselves. Then (sgn »)(L ®© M ® N), is the same multilinear 
function for all p in a given left coset of G(r, s, t). Choose one element 
from each left coset of G(r, s, t), and let E be the sum of the corresponding 
terms (sgn »)(L ® M ® N),. Then £ is independent of the way in which 
the representatives u are chosen, and 

risl B= Truse LOM ON). 

We shall show that (L A M) A Nand L A (M A N) are both equal to £. 
Let G(r + s, t) be the subgroup of S,+s+4: that permutes the sets 
{1,...,r+s}, ft+ts4+1...,r+s+H 
within themselves. Let T be any set of permutationsof {1,...,7-+s-+ ¢} 


which contains exactly one element from each left coset of G(r + s, #). 
By (5-50) 


(LAM) AN = d(senr)[(L A M) ON] 
where the sum is extended over the permutations r in T. Now let G(r, s) 
be the subgroup of S,,, that permutes the sets 
fl,...,7}, f@ +1...,r+s} 


within themselves. Let S be any set of permutations of {1,...,7+ s} 
which contains exactly one element from each left coset of G(r, s). From 
(5-50) and what we have shown above, it follows that 


(LAM) AN = Dd (sgno)(sen7)[(L @ M) ON]: 
where the sum is extended over all pairs ø, r in S X T. If we agree to 
identify each ø in S,;, with the element of S,+,,, which agrees with ø on 


{l,...,7 + s} and is the identity on {r + s + 1,...,r +s + t}, then 


we may write 
(L A M) AN = Z sgn (o DEL O M ON)h 
But, i 
[LOM O N)] = (LOM O N) 


Therefore 
(LAM) AN = sgn (ro) (LOM ® Nye. 
Now suppose we have 
T101 = T202 Y 


with o; in S, r: in T, and y in G(r, s, t). Then r3' 71 = opyo;7', and since 
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syyo;' lies in G(r + s, t), it follows that rı and rz are in the same left coset 
of G(r + s, t). Therefore, 71 = 72, and o; = oy. But this implies that o1 
and o (regarded as elements of S;,) lie in the same coset of G(r, s); hence 
cı = oz. Therefore, the products re corresponding to the 


(r+tst+t!(r+s)!- 
(r+ sH! oris! 


pairs (7,¢) in T XS are all distinct and lie in distinct cosets of G(r, s, t). 
Since there are exactly 





(r7+s+ 2)! 
risti! 
left cosets of G(r, s, t) in S,4.41 it follows that (L A M) A N = E. By 
an analogous argument, L A (M A N) = E aswell. f 


ExAmMPLE 13. The exterior product is closely related to certain for- 
mulas for evaluating determinants known as the Laplace expansions. 
Let K be a commutative ring with identity and n a positive integer. Sup- 
pose that 1 < r < n, and let L be the alternating r-linear form on K” 
defined by 


An ee Ayw 
L(a,...,a,) = det] : : |. 
An gies Arr 
If s = n — r and M is the alternating s-linear form 
Áit) ee Ain 
M(o,...,@s) = det : : 
Ascr4) es Asn 


then L A M = D, the determinant function on K”. This is immediate 
from the fact that L A M is an alternating n-linear form and (as can be 
seen) 

(L A M)(a, aie -y En) =]. 
If we now describe L A M in the correct way, we obtain one Laplace 
expansion for the determinant of an n X n matrix over K. 

In the permutation group S,, let G be the subgroup which permutes 
the sets {1,...,r} and {r+1,...,m} within themselves. Each left 
coset of G contains precisely one permutation ø such that ol < 62 <... < 
orand olr + 1) <... < on. The sign of this permutation is given by 

sgn o = (Latte tart (re —1)/2), 


The wedge product L A M is given by 
(LA M)(aa,..., an) = È (sgn o) Lao, . . . , aor) M (Qot) «+ + 5 Con) 


where the sum is taken over a collection of o’s, one from each coset of G. 
Therefore, 
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(LA Mie, eed Qn) = ; > f ez Lian, ETS aj) M (cn, -e Ak) 
Pees <p 
where 
eg = (Kate titre 0)/2) 
ky = olr + i). 
In other words, 
Aja e Appl[Amrti e Akn 


det A = D> e 


ALe <r 














Aia e+ Aine Akti +++ Akn 
This is one Laplace expansion. Others may be obtained by replacing the 
sets {1,...,r} and {r +1,...,n} by two different complementary 


sets of indices. 


If V is a K-module, we may put the various form modules A7(V) 
together and use the exterior product to define a ring. For simplicity, we 
shall do this only for the case of a free K-module of rank n. The modules 
A’(V) are then trivial for r > n. We define 

AV) = AV) OAV) ® - PAV). 

This is an external direct sum—something which we have not discussed 
previously. The elements of A(V) are the (n + 1)-tuples (Lo, . . - , Ln) 
with L, in A’(V). Addition and multiplication by elements of K are defined 
as one would expect for (n + 1)-tuples. Incidentally, A%(V) = K. If we 
identify A’(K) with the (n + 1)-tuples (0,...,0,Z,0,...,0) where L 
is in A’(K), then A*(K) is a submodule of A(V) and the direct sum 
decomposition 


A(V) = AV) ® ++ PAV) 
holds in the usual sense. Since A’(V) is a free K-module of rank C) we 


see that A(V) is a free K-module and 


rank A(V) 


ll 
TMs 
oD 
AT 
2 2g 
_ 


= 27, 
The exterior product defines a multiplication in A(V): Use the exterior 
product on forms and extend it linearly to A(V). It distributes over the 
addition of A(V) and gives A(V) the structure of a ring. This ring is the 
Grassman ring over V*. It is not a commutative ring, e.g., if L, M are 
respectively in A’ and As, then 
LAM = (-1)"M A L. 


But, the Grassman ring is important in several parts of mathematics. 


6. Elementary 


Canonical Forms 


6.1. Introduction 


We have mentioned earlier that our principal aim is to study linear 
transformations on finite-dimensional vector spaces. By this time, we have 
seen many specific examples of linear transformations, and we have proved 
a few theorems about the general linear transformation. In the finite- 
dimensional case we have utilized ordered bases to represent such trans- 
formations by matrices, and this representation adds to our insight into 
their behavior. We have explored the vector space L(V, W), consisting of 
the linear transformations from one space into another, and we have 
explored the linear algebra L(V, V), consisting of the linear transformations 
of a space into itself. 

In the next two chapters, we shall be preoccupied with linear operators. 
Our program is to select a single linear operator T on a finite-dimensional 
vector space V and to ‘take it apart to see what makes it tick.’ At this 
early stage, it is easiest to express our goal in matrix language: Given the 
linear operator T, find an ordered basis for V in which the matrix of T 
assumes an especially simple form. 

Here is an illustration of what we have in mind. Perhaps the simplest 
matrices to work with, beyond the scalar multiples of the identity, are the 
diagonal matrices: 


ee AO. eet 

0 Ce 0 see 0 

(6-1) D=|0 0 & =» 0 
00 0 te 
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Let T be a linear operator on an n-dimensional space V. If we could find 
an ordered basis ® = {m,..., an} for V in which T were represented by 
a diagonal matrix D (6-1), we would gain considerable information about T. 
For instance, simple numbers associated with T, such as the rank of T or 
the determinant of T, could be determined with little more than a glance 
at the matrix D. We could describe explicitly the range and the null space 
of T. Since [T]g = D if and only if 


(6-2) Ton = Chak, k=1,...,n 


the range would be the subspace spanned by those a,’s for which c = 0 
and the null space would be spanned by the remaining axs. Indeed, it 
seems fair to say that, if we knew a basis @ and a diagonal matrix D such 
that [T] = D, we could answer readily any question about T which 
might arise. 

Can each linear operator T be represented by a diagonal matrix in 
some ordered basis? If not, for which operators T does such a basis exist? 
How can we find such a basis if there is one? If no such basis exists, what 
is the simplest type of matrix by which we can represent T? These are some 
of the questions which we shall attack in this (and the next) chapter. The 
form of our questions will become more sophisticated as we learn what 
some of the difficulties are. 


6.2. Characteristic Values 


The introductory remarks of the previous section provide us with a 
starting point for our attempt to analyze the general linear operator T. 
We take our cue from (6-2), which suggests that we should study vectors 
which are sent by T into scalar multiples of themselves. 


Definition. Let V be a vector space over the field F and let T be a linear 
operator on V. A characteristic value of T is a scalar c in F such that 
there is a non-zero vector ain V with Ta = ca. If cis a characteristic value of 
T, then 


(a) any a such that Ta = ca is called a characteristic vector of T 
associated with the characteristic value c; 

(b) the collection of all a such that Ta = ca is called the characteristic 
space associated with c. 


Characteristic values are often called characteristic roots, latent roots, 
eigenvalues, proper values, or spectral values. In this book we shall use 
only the name ‘characteristic values.’ 

If T is any linear operator and c is any scalar, the set of vectors a such 
that Ta = ca is a subspace of V. It is the null space of the linear trans- 
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formation (T — cl). We call c a characteristic value of T if this subspace 
is different from the zero subspace, i.e., if (T — cI) fails to be 1:1. If the 
underlying space V is finite-dimensional, (T — cI) fails to be 1:1 precisely 
when its determinant is different from 0. Let us summarize. 


Theorem 1. Let T be a linear operator on a finite-dimensional space V 
and let c be a scalar. The following are equivalent. 


(1) c is a characteristic value of T. 
(ii) The operator (T — cI) is singular (not invertible). 
(ili) det (T — cI) = 0. 


The determinant criterion (iii) is very important because it tells us 
where to look for the characteristic values of T. Since det (T — cI) is a 
polynomial of degree n in the variable c, we will find the characteristic 
values as the roots of that polynomial. Let us explain carefully. 

If G is any ordered basis for V and A = [T']g, then (T — cI) is in- 
vertible if and only if the matrix (A — cI) is invertible. Accordingly, we 
make the following definition. 


Definition. If A ts ann X n matrix over the field F, a characteristic 
value of A in F is a scalar c in F such that the matrix (A — cl) is singular 
(not invertible). 


Since c is a characteristic value of A if and only if det (A — cl) = 0, 
or equivalently if and only if det (cf — A) = 0, we form the matrix 
(cI — A) with polynomial entries, and consider the polynomial f = 
det (zI — A). Clearly the characteristic values of A in F are just the 
scalars c in F such that f(c) = 0. For this reason f is called the charac- 
teristic polynomial of A. It is important to note that f is a monic poly- 
nomial which has degree exactly n. This is easily seen from the formula 
for the determinant of a matrix in terms of its entries. 


Lemma. Similar matrices have the same characteristic polynomial. 
Proof. If B = P-!AP, then 


det (cI — B) = det (zI — PAP) 
= det (P—!(xI — A)P) 
= det P~! - det (zI — A) - det P 
= det (zI — A). J 


This lemma enables us to define sensibly the characteristic polynomial 
of the operator T as the characteristic polynomial of any n X n matrix 
which represents T in some ordered basis for V. Just as for matrices, the 
characteristic values of T will be the roots of the characteristic polynomial 
for T. In particular, this shows us that T cannot have more than n distinct 
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characteristic values. It is important to point out that T may not have any 
characteristic values. 


Exampe_e 1. Let T be the linear operator on R? which is represented 
in the standard ordered basis by the matrix 


0 =l 
Aal ol 
The characteristic polynomial for T (or for A) is 
= x 1 = 72 
det (er A) =|_f a2 +1. 


Since this polynomial has no real roots, T has no characteristic values. 
If U is the linear operator on C? which is represented by A in the standard 
ordered basis, then U has two characteristic values, 7 and —7. Here we 
see a subtle point. In discussing the characteristic values of a matrix 
A, we must be careful to stipulate the field involved. The matrix A above 
has no characteristic values in R, but has the two characteristic values 
i and —i in C. 


EXAMPLE 2. Let A be the (real) 3 X 3 matrix 


3 1 -!1 
2 2 =l} 
2 2 0 


Then the characteristic polynomial for A is 
z—3 —1 1 
—2 zxr-—2 1 
—2 —2 r 


= 2 — 5r? + 8x — 4 = (x — 1)(z — 2)%. 








Thus the characteristic values of A are 1 and 2. 

Suppose that T is the linear operator on R? which is represented by A 
in the standard basis. Let us find the characteristic vectors of T associated 
with the characteristic values, 1 and 2. Now 


2 1 —1 
A-—I=|2 1 —1} 
2 2 -l1 


It is obvious at a glance that A — J has rank equal to 2 (and hence T — I 
has nullity equal to 1). So the space of characteristic vectors associated 
with the characteristic value 1 is one-dimensional. The vector a, = (1, 0, 2) 
spans the null space of T — J. Thus Ta = a if and only if «æ is a scalar 
multiple of a; Now consider 

1 1 -1 

A-—21=|2 0 —1}- 
2 2 —2 
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Evidently A — 2I also has rank 2, so that the space of characteristic 
vectors associated with the characteristic value 2 has dimension 1. Evi- 
dently Ta = 2a if and only if @ is a scalar multiple of ag = (1, 1, 2). 


Definition. Let T be a linear operator on the finite-dimensional space . 


V. We say that T is diagonalizable if there is a basis for V each vector 
of which is a characteristic vector of T. 


The reason for the name should be apparent; for, if there is an ordered 
basis ® = {aj,..., a} for V in which each a; is a characteristic vector of 
T, then the matrix of T in the ordered basis @ is diagonal. If Ta; = cia, 


then 
a 0 0 
(Me=|9 20 3 
te 
We certainly do not require that the scalars c1, . . . , Cn be distinct; indeed, 


they may all be the same scalar (when T is a scalar multiple of the identity 
operator). 

One could also define T to be diagonalizable when the characteristic 
vectors of T span V. This is only superficially different from our definition, 
since we can select a basis out of any spanning set of vectors. 

For Examples 1 and 2 we purposely chose linear operators T on R” 
which are not diagonalizable. In Example 1, we have a linear operator on 
R? which is not diagonalizable, because it has no characteristic values. 
In Example 2, the operator T has characteristic values; in fact, the charac- 
teristic polynomial for T factors completely over the real number field: 
f = (« — 1)(@ — 2). Nevertheless T fails to be diagonalizable. There is 
only a one-dimensional space of characteristic vectors associated with each 
of the two characteristic values of 7. Hence, we cannot possibly form a 
basis for R’ which consists of characteristic vectors of T. 

Suppose that T is a ciagonalizable linear operator. Let a,..., cz be 
the distinct characteristic values of T. Then there is an ordered basis ® in 
which T is represented by a diagonal matrix which has for its diagonal 
entries the scalars c;, each repeated a certain number of times. If c; is 
repeated d; times, then (we may arrange that) the matrix has the block 
form 


al; O ++ 0 
0 ae. 46) 
(6-3) [Th =| ° 0% 


where J; is the d; X d;identity matrix. From that matrix we see two things. 
First, the characteristic polynomial for T is the product of (possibly 
repeated) linear factors: 
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f= (e— eh +++ (2 — oe 


If the scalar field F is algebraically closed, e.g., the field of complex num- 
bers, every polynomial over F can be so factored (see Section 4.5); however, 
if F is not algebraically closed, we are citing a special property of T when 
we say that its characteristic polynomial has such a factorization. The 
second thing we see from (6-3) is that di, the number of times which c; is 
repeated as root of f, is equal to the dimension of the space of characteristic 
vectors associated with the characteristic value c;. That is because the 
nullity of a diagonal matrix is equal to the number of zeros which it has on 
its main diagonal, and the matrix [T — c:I]ẹ has d: zeros on its main 
diagonal. This relation between the dimension of the characteristic space 
and the multiplicity of the characteristic value as a root of f does not seem 
exciting at first; however, it will provide us with a simpler way of deter- 
mining whether a given operator is diagonalizable. 


Lemma. Suppose that Ta = ca. If f is any polynomial, then f{(T)a = 
f(c)a. 


Proof. Exercise. 


Lemma. Let T be a linear operator on the finite-dimensional space V. 
Let cı, . . . , Ck be the distinct characteristic values of T and let W; be the space 
of characteristic vectors associated with the characteristic value ci. If W = 
Wit :-: + Ws, then . 


dim W = dim Wi + -;- + dim Wr. 


In fact, if &; is an ordered basis for W;, then ® = (Gi, . . . , Gk) is an ordered 
basis for W. 


Proof. The space W = W, + --- + W, is the subspace spanned 
by all of the characteristic vectors of T. Usually when one forms the sum 
W of subspaces W;, one expects that dim W < dim W, + --- + dim W} 
because of linear relations which may exist between vectors in the various 
spaces. This lemma states that the characteristic spaces associated with 
different characteristic values are independent of one another. 

Suppose that (for cach 7) we have a vector 6; in W,, and assume that 
Bı +--+ +6, = 0. We shall show that 6; = 0 for each 7. Let f be any 
polynomial. Since T8; = c;8;, the preceding lemma tells us that 


0 =f(T)0 = f(T)Bi + ++: +H(T)R 
= f(a)Br + +++ + f(ce)Be. 


Choose polynomials fı, . . . , fe such that 


E D l, i=j 
le) = 84 = do peg 
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Then 
0 = fi(T)0 = È ôb; 
J 
= Bi. 
Now, let @; be an ordered basis for W;, and let @ be the sequence 
G = (@1..., @k). Then @ spans the subspace W = Wi+--- + Wz 


Also, ® is a linearly independent sequence of vectors, for the following 
reason. Any linear relation between the vectors in @ will have the form 
Bı + +++ + Bk = 0, where £; is some linear combination of the vectors in 
@;. From what we just did, we know that 8; = 0 for each 7. Since each ®; 
is linearly independent, we see that we have only the trivial linear relation 
between the vectors in®. J 


Theorem 2. Let T be a linear operator on a finite-dimensional space V. 
Let a, . . . , Cy be the distinct characteristic values of T and let W; be the null 
space of (T — eI). The following are equivalent. 


(i) T is diagonalizable. 
(ii) The characteristic polynomial for T is 
f= (x = +++ (= ey) 
and dim W; = d i = 1,...,k. 
(iii) dim Wi + --- + dim Wx = dim V. 

Proof. We have observed that (i) implies (ii). If the characteristic 
polynomial f is the product of linear factors, as in (ii), then dj + --- + 
dk = dim V. For, the sum of the d,’s is the degree of the characteristic 
polynomial, and that degree is dim V. Therefore (ii) implies (iii). Suppose 
(iii) holds. By the lemma, we must have V = W,+ --: + Wy, i.e., the 
characteristic vectors of T span V. I 


The matrix analogue of Theorem 2 may be formulated as follows. Let 
A be ann X n matrix with entries in a field F, and let c1,...,c, be the 
distinct characteristic values of A in F. For each í, let W; be the space of 
column matrices X (with entries in F) such that 


(A ~ of)X = 0, 
and let ®; be an ordered basis for W;. The bases @ı, . . . , @k collectively 
string together to form the sequence of columns of a matrix P: 
P = (Py, Pa ...] = (@i,..., Bx). 
The matrix A is similar over F to a diagonal matrix if and only if Pisa 


square matrix. When P is square, P is invertible and P~!AP is diagonal. 


EXAMPLE 3. Let T be the linear operator on R? which is represented in 
the standard ordered basis by the matrix 
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5 —6 —6 
A=|-1 4 2 | 
3-6 — 


Let us indicate how one might compute the characteristic polynomial, 
using various row and column operations: 





























r—5 6 6 r—5 0 6 
1 r—4 -2 |=] 1 r—2 -—2 
—3 6 ra+4 —3 2-2 t+4 
r-—-5 0 6 
= (z—2)| 1 1 —2 
—3 ~] z+4 
z-5 0 6 
= (x—2)| 1 1 —2 
—2 0 r+2 
z—5 6 
Sa) —2 | 
= (x — 2)(z? — 3z + 2) 
= (x — 2)?(a — 1). 


What are the dimensions of the spaces of characteristic vectors associated 
with the two characteristic values? We have 


4 -6 -6 
A-IJ=|-1 3 2 
3 —6 —-5 


3 —6 —6 
A— 21 =| -1 2 2f 
3 —6 —-6 


We know that A — J is singular and obviously rank (A — I) > 2. There- 
fore, rank (A — I) = 2. It is evident that rank (A — 2I) = 1. 

Let Wı, Wz be the spaces of characteristic vectors associated with the 
characteristic values 1, 2. We know that dim W, = 1 and dim W: = 2. By 
Theorem 2, T is diagonalizable. It is easy to exhibit a basis for R’ in which 
T is represented by a diagonal matrix. The null space of (T — J) is spanned 
by the vector a: = (8, — 1, 3) and so {ai} is a basis for W;. The null space 
of T — 2I (i.e., the space W2) consists of the vectors (41, %2, %3) with zı = 
2x: + 223. Thus, one example of a basis for We is 


aœ = (2, 1, 0) 
g = (2, 0, 1). 


If G = {a;, a2, a3}, then [T]e is the diagonal matrix 
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1 0 0 
D=|0 2 O} 
0 0 2 


The fact that T is diagonalizable means that the original matrix A is 
similar (over R) to the diagonal matrix D. The matrix P which enables us 
to change coordinates from the basis @ to the standard basis is (of course) 
the matrix which has the transposes of ay, a, a3 as its column vectors: 


3 2 2 
P={-1 1 O} 
3 0 1 
Furthermore, AP = PD, so that 
PAP = D. 


Exercises 


1. In each of the following cases, let T be the linear operator on R? which is 
represented by the matrix A in the standard ordered basis for R2, and let U be 
the linear operator on C? represented by A in the standard ordered basis. Find the 
characteristic polynomial for T and that for U, find the characteristic values of 
each operator, and for each such characteristic value c find a basis for the cor- 
responding space of characteristic vectors. 


1 0 2 3 11 
amli ob asl i} asli 1 


2. Let V be an n-dimensional vector space over F. What is the characteristic 
polynomial of the identity operator on V? What is the characteristic polynomial 
for the zero operator? 


3. Let A be an n X n triangular matrix over the field F. Prove that the charac- 
teristic values of A are the diagonal entries of A, i.e., the scalars Asi- 


4. Let T be the linear operator on R? which is represented in the standard ordered 
basis by the matrix 
-9 4 4 
| Zs 3 | 
—16 8 7 


Prove that T is diagonalizable by exhibiting a basis for R*, each vector of which 
is a characteristic vector of T. 


5. Let 
6 -3 -2 
A= l 4 —i | 
10 —5 -3 


Is A similar over the field R to a diagonal matrix? Is A similar over the field C toa 
diagonal matrix? 
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6. Let T be the linear operator on R‘ which is represented in the standard ordered 
basis by the matrix 


oroeo 


0 
a 
0 
0 
Under what conditions on a, b, and c is T diagonalizable? 


7. Let T be a linear operator on the n-dimensional vector space V, and suppose 
that T has n distinct characteristic values. Prove that T is diagonalizable. 


8. Let A and B be n X n matrices over the field F. Prove that if (I — AB) is 
invertible, then J — BA is invertible and 


(I — BA)™ = I + BU — AB)"A. 


9. Use the result of Exercise 8 to prove that, if A and B are n X n matrices 
over the field F, then AB and BA have precisely the same characteristic values in F. 


10. Suppose that A isa 2 X 2 matrix with real entries which is symmetric (At = A). 
Prove that A is similar over R to a diagonal matrix. 


11. Let N be a2 X 2 complex matrix such that N? = 0. Prove that either N = 0 
or N is similar over C to 
if 07, 
1 0 


12. Use the result of Exercise 11 to prove the following: If A is a 2 X 2 matrix 
with complex entries, then A is similar over C to a matrix of one of the two types 


[ool Lt al 


13. Let V be the vector space of all functions from R into R which are continuous, 
i.e., the space of continuous real-valued functions on the real line. Let T be the 
linear operator on V defined by 


TNE) = [ŽO at 
Prove that T has no characteristic values. 
14. Let A be an n X n diagonal matrix with characteristic polynomial 
(© = e+ --( — cah, 


where cı, ..., c are distinct. Let V be the space of n X n matrices B such that 
AB = BA. Prove that the dimension of V is dj + --+ + d2. 


15. Let V be the space of n X n matrices over F., Let A be a fixed n X n matrix 
over F. Let T be the linear operator ‘left multiplication by A’ on V. Is it true that 
A and T have the same characteristic values? 


6.3. Annihilating Polynomials 


In attempting to analyze a linear operator T, one of the most useful 
things to know is the class of polynomials which annihilate T. Specifically, 
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suppose T is a linear operator on V, a vector space over the field F. If p isa 
polynomial over F, then p(T) is again a linear operator on V. If qis another 
polynomial over F, then 


(p + q(T) = p(T) + q(T) 
(pq)(T) = p(T)q(T). 
Therefore, the collection of polynomials p which annihilate 7’, in the sense 
that 
p(T) = 0, 

is an ideal in the polynomial algebra F[x]. It may be the zero ideal, i.e., it 
may be that T is not annihilated by any non-zero polynomial. But, that 
cannot happen if the space V is finite-dimensional. 

Suppose T is a linear operator on the n-dimensional space V. Look at 
the first (n? + 1) powers of T: 


Ll Tey ok og Th 


This is a sequence of n? + 1 operators in L(V, V), the space of linear 
operators on V. The space L(V, V) has dimension n*. Therefore, that 
sequence of n? + 1 operators must be linearly dependent, i.e., we have 


col aT + --- +enT" =0 


for some scalars c; not all zero. So, the ideal of polynomials which annihilate 
T contains a non-zero polynomial of degree n? or less. 

According to Theorem 5 of Chapter 4, every polynomial ideal consists 
of all multiples of some fixed monic polynomial, the generator of the ideal. 
Thus, there corresponds to the operator T a monic polynomial p with this 
property : If f is a polynomial over F, then f(T) = 0 if and only if f = pg, 
where g is some polynomial over F. 


Definition. Let T be a linear operator on a finite-dimensional vector 
space V over the field F. The minimal. polynomial for T is the (unique) 
monic generator of the ideal of polynomials over F which annthilate T. 


Thename ‘minimal polynomial’ stems from the fact that the generator 
of a polynomial ideal is characterized by being the monic polynomial of 
minimum degree in the ideal. That means that the minimal polynomial p 
for the linear operator T is uniquely determined by these three properties: 


(1) p is a monic polynomial over the scalar field F. 

(2) p(T) = 0. 

(3) No polynomial over F which annihilates T has smaller degree than 
p has. 


If A isan n X n matrix over F, we define the minimal polynomial 
for A in an analogous way, as the unique monic generator of the ideal of all 
polynomials over F which annihilate A. If the operator T is represented in 
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some ordered basis by the matrix A, then T and A have the same minimal 
polynomial. That is because f(T) is represented in the basis by the matrix 
f(A), so that f(T) = Oif and only if f(A) = 0. 

From the last remark about operators and matrices it follows that 
similar matrices have the same minimal polynomial. That fact is also clear 
from the definitions because 


J(P—'AP) = P~f(A)P 


for every polynomial f. 

There is another basic remark which we should make about minimal 
polynomials of matrices. Suppose that A is an n X n matrix with entries 
in the field F. Suppose that F is a field which contains F as a subfield. (For 
example, A might be a matrix with rational entries, while F, is the field of 
real numbers. Or, A might be a matrix with real entries, while Fı is the 
field of complex numbers.) We may regard A either as an n X n matrix 
over F or as an n X n matrix over F;. On the surface, it might appear that 
we obtain two different minimal polynomials for A. Fortunately that is 
not the case; and we must see why. What is the definition of the minimal 
polynomial for A, regarded as an n X n matrix over the field F? We 
consider all monic polynomials with coefficients in F which annihilate A, 
and we choose the one of least degree. If f is a monic polynomial over F: 


k-1 
(6-4) f= ak + 2 a;xi 
j= 


then f(A) = 0 merely says that we have a linear relation between the 
powers of A: 


(6-5) A* + ak4 4} + ewe + aÁ + Aol = 0. 


The degree of the minimal polynomial is the least positive integer k such 
that there is a linear relation of the form (6-5) between the powers J, 
A,...,A*. Furthermore, by the uniqueness of the minimal polynomial, 
there is for that k one and only one relation of the form (6-5); i.e., once the 
minimal k is determined, there are unique scalars ao, . . . , @k-ı in F such 
that (6-5) holds. They are the coefficients of the minimal polynomial. 
Now (for each k) we have in (6-5) a system of n? linear equations for 


the ‘unknowns’ dp, .. . . , @,_1. Since the entries of A lie in F, the coefficients 
of the system of equations (6-5) are in F. Therefore, if the system has a 
solution with a@,..., @1 in F; it has a solution with ao,..., axı in F. 


(See the end of Section 1.4.) It should now be clear that the two minima 
polynomials are the same. 

What do we know thus far about the minimal polynomial for a linear 
operator on an n-dimensional space? Only that its degree does not exceed 
n?. That turns out to be a rather poor estimate, since the degree cannot 
exceed n. We shall prove shortly that the operator is annihilated by its 
characteristic polynomial. First, let us observe a more elementary fact. 
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Theorem 3. Let T be a linear operator on an n-dimensional vector 
space V [or, let A be an n X n matriz]. The characteristic and minimal 
polynomials for T [for A] have the same roots, except for multiplicities. 


Proof. Let p be the minimal polynomial for T. Let c be a scalar. 
What we want to show is that a(c) = 0 if and only if c is a characteristic 
value of T. 
First, suppose p(c) = 0. Then 
p = (z — c)q 
where q is a polynomial. Since deg q < deg p, the definition of the minimal 


polynomial p tells us that q(T) = 0. Choose a vector 8 such that q(T)8 = 0. 
Let a = q(T)6. Then 


0 = p(T)B 
= (T — cI)q(T)B 
= (T —cl)a 


and thus, c is a characteristic value of T. 
Now, suppose that c is a characteristic value of T, say, Ta = ca with 
a ~ 0. As we noted in a previous lemma, 


p(T )a = p(c)a. 
Since p(T) = Oand a # 0, we have p(c) = 0. J 
Let T be a diagonalizable linear operator and let c,..., Ce be the 
distinct characteristic values of T. Then it is easy to see that the minimal 
polynomial for T is the polynomial 
p = (@— a)-e @— c). 


If a is a characteristic vector, then one of the operators T — cJ,..., 
T — cI sends a into 0. Therefore 


(T —cl)-->(T — cala = 0 


for every characteristic vector a. There is a basis for the underlying space 
which consists of characteristic vectors of T; hence 


p(T) = (T - cl) --- (T — cl) = 0. 


What we have concluded is this. If T is a diagonalizable linear operator, 
then the minimal polynomial for T is a product of distinct linear factors. 
As we shall soon see, that property characterizes diagonalizable operators. 


EXAMPLE 4. Let’s try to find the minimal polynomials for the operators 
in Examples 1, 2, and 3. We shall discuss them in reverse order. The oper- 
ator in Example 3 was found to be diagonalizable with characteristic 
polynomial 


f = @ — 1)@ — 2). 
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From the preceding paragraph, we know that the minimal polynomial for 
T is 

p = (z — 1)@ — 2). 
The reader might find it reassuring to verify directly that 

(A —I)(A — 22) = 0. 


In Example 2, the operator T also had the characteristic polynomial 
f = (x — 1)(x — 2)?. But, this T is not diagonalizable, so we don’t know 
that the minimal polynomial is (x — 1) (x — 2). What do we know about 
the minimal polynomial in this case? From Theorem 3 we know that its 
roots are 1 and 2, with some multiplicities allowed. Thus we search for p 
among polynomnials of the form (x — 1)*(z — 2)4"-k > 1,1 > 1. Try (z — 1) 
(x — 2): 


21 —1]f1 1 -1 
(A—-D(A-2) =|2 1 -1]}2 0 -1 
22 -1}L2 2 -2 
20 -1 
=|2 0 -ıl 
40 -2 


Thus, the minimal polynomial has degree at least 3. So, next we should try 
either (x — 1)*(x — 2) or (x — 1)(x — 2)*. The second, being the charac- 
teristic polynomial, would seem a less random choice. One can readily 
compute that (A — I)(A — 2I)? = 0. Thus the minimal polynomial for T 
is its characteristic polynomial. 

In Example 1 we discussed the linear operator T on R? which is 
represented in the standard basis by the matrix 


0-1 
a k ol 
The characteristic polynomial is z? + 1, which has no real roots. To 
determine the minimal polynomial, forget about T and concentrate on A. 
As a complex 2 X 2 matrix, A has the characteristic values 7 and —1. 
Both roots must appear in the minimal polynomial. Thus the minimal 
polynomial is divisible by z? + 1. It is trivial to verify that A? + I = 0. 
So the minimal polynomial is z? + 1. 


Theorem 4 (Cayley-Hamilton). Let T be a linear operator on a 
finite dimensional vector space V. If f is the characteristic polynomial for T, 
then f(T) = 0; in other words, the minimal polynomial divides the charac- 
teristic polynomial for T. 


Proof. Later on we shall give two proofs of this result independent 
of the one to be given here. The present proof, although short, may be 
difficult to understand. Aside from brevity, it has the virtue of providing 
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an illuminating and far from trivial application of the general theory of 
determinants developed in Chapter 5. 

Let K be the commutative ring with identity consisting of all poly- 
nomials in T. Of course, K is actually a commutative algebra with identity 
over the scalar field. Choose an ordered basis {a1, . . . , Œn} for V, and let A 
be the matrix which represents T in the given basis. Then 


Ta; = 5 Ajia;, 1 < i < n, 
These equations may be written in the equivalent form 
n 
D (ôT = Ajl)a; = 0, 1 < i <n. 
i=1 


Let B denote the element of K**" with entries 


Bi; = ôi; T Gig Azil. 


When n = 2 
B= p — Anl —Aal | 
—Avl T — Anal 
and 
det B = (T — AnlI)(T — Ant) — AvAnl 


T= (Au + An)T + (An Age = AxA) 
= f(T) 
where f is the characteristic polynomial: 
f = x? — (trace A)x + det A. 
For the case n > 2, it is also clear that 
det B = f(T) 


since f is the determinant of the matrix x] — A whose entries are the 
polynomials 
(al = Aij = 6552 = Aj. 


We wish to show thatf(7’) = 0. In order that f(T’) be the zero operator, 


it is necessary and sufficient that (det B)a, = 0 fork = 1,...,n. By the 
definition of B, the vectors ai, . . . , a satisfy the equations 
(6-6) È Bijaj = 0, I1<i<n. 

j=l 


When n = 2, it is suggestive to write (6-6) in the form 


et — Anl J[e]- [3] 
=Ainl T — AxA Jla] Lo 


In this case, the classical adjoint, adj B is the matrix 


ga a Anl ] 
= [AnI T Aal 
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and 


he l det B 


(det B) [2] = (BB) [2] 


det B 0 } 


Hence, we have 


In the general case, let B = adj B. Then by (6-6) 
> BusBija; =x 0 
j=1 


for each pair k, 7, and summing on 17, we have 
n Nn n 
0= 2 È BrBijaj; 


i=l j=l 


> (3 BuBu) Qj. 
j=1 \=1 


i= 
Now BB = (det B)J, so that 


2 BuBa = xj det B. 
Therefore 


0= 5 &,(det B)a, 
j=l 
= (det Bla, 1<k<n. J 


The Cayley-Hamilton theorem is useful to us at this point primarily 
because it narrows down the search for the minimal polynomials of various 
operators. If we know the matrix A which represents T in some ordered 
basis, then we can compute the characteristic polynomial f. We know that 
the minimal polynomial p divides f and that the two polynomials have the 
same roots. There is no method for computing precisely the roots of a 
polynomial (unless its degree is small); however, if f factors 


(6-7) f= (e-—a)*---(c—c)*, cn... , distinct, d; > 1 
then 
(6-8) p= (x —_ 1)” ya (x pa ch)", 1 < rj < d;. 


That is all we can say in general. If f is the polynomial (6-7) and has 
degree n, then for every polynomial p as in (6-8) we can find ann X n 
matrix which has f as its characteristic polynomial and p as its minimal 
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polynomial. We shall not prove this now. But, we want to emphasize the 
fact that the knowledge that the characteristic polynomial has the form 
(6-7) tells us that the minimal polynomial has the form (6-8), and it tells us 
nothing else about p. 


Examp.e 5. Let A be the 4 X 4 (rational) matrix 


0101 
10410 
EE 
1010 
The powers of A are easy to compute: 
2020 
02 02 
a 
ae 20 2 0 
02 0 2 
04 0 4 
4 0 4 0 
$= x 
A 0404 
4040 


Thus A’ = 44, i.e., if p = x? — 4r = a(x + 2)(x — 2), then p(A) = 0. 
The minimal polynomial for A must divide p. That minimal polynomial is 
obviously not of degree 1, since that would mean that A was a scalar 
multiple of the identity. Hence, the candidates for the minimal polynomial 
are: p, x(x + 2), x(x — 2), x? — 4. The three quadratic polynomials can be 
eliminated because it is obvious at a glance that A? ~ —2A, A? = 2A, 
A? = 4I. Therefore p is the minimal polynomial for A. In particular 0, 2, 
and —2 are the characteristic values of A. One of the factors z, x — 2, 
x + 2 must be repeated twice in the characteristic polynomial. Evidently, 
rank (A) = 2. Consequently there is a two-dimensional space of charac- 
teristic vectors associated with the characteristic value 0. From Theorem 
2, it should now be clear that the characteristic polynomial is x?(x? — 4) 
and that A is similar over the field of rational numbers to the matrix 


0 

0 

ol 
—2 


ooo 
onoo 


0 
0 
0 
0 


Exercises 


1. Let V be a finite-dimensional vector space. What is the minimal polynomial 
for the identity operator on V? What is the minimal polynomial for the zero 
operator? 
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2. Let a, b, and c be elements of a field F, and let A be the following 3 X 3 matrix 


over F: 
00 c 
A= f 0 | 
0 1l a 


Prove that the characteristic polynomial for A is zê — az? — bz — c and that this 
is also the minimal polynomial for A. 


3. Let A be the 4 X 4 real matrix 


1 1 00 
-1 -1 00 

ce ag 21] 
1 1-1 0 


Show that the characteristic polynomial for A is z?(x — 1)? and that it is also 
the minimal polynomial. 

4, Is the matrix A of Exercise 3 similar over the field of complex numbers to a 
diagonal matrix? 

5. Let V be an n-dimensional vector space and let T be a linear operator on V. 
Suppose that there exists some positive integer k so that T* = 0, Prove that 
T” =0. 

6. Finda 3 X 3 matrix for which the minimal polynomial is 2?. 


7. Let n be a positive integer, and let V be the space of polynomials over R 
which have degree at most n (throw in the 0-polynomial). Let D be the differentia- 
tion operator on V. What is the minimal polynomial for D? 


8. Let P be the operator on R? which projects each vector onto the z-axis, parallel 
to the y-axis: P(x, y) = (x, 0). Show that P is linear. What is the minimal poly- 
nomial for P? 


9. Let A be an n X n matrix with characteristic polynomial 


f = (x — C)%- . (a — Cx) oe, 
Show that 
cdi +--+ + crdy = trace (A), 


10. Let V be the vector space of n X n matrices over the field F. Let A be a fixed 
n X n matrix. Let T be the linear operator on V defined by 


T(B) = AB. 
Show that the minimal polynomial for T is the minimal polynomial for A. 


11. Let A and B be n X n matrices over the field F. According to Exercise 9 of 
Section 6.1, the matrices AB and BA have the same characteristic values. Do 
they have the same characteristic polynomial? Do they have the same minimal 
polynomial? 


6.4. Invariant Subspaces 


In this section, we shall introduce a few concepts which are useful in 
attempting to analyze a linear operator. We shall use these ideas to obtain 
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characterizations of diagonalizable (and triangulable) operators in terms 
of their minimal polynomials. 


Definition. Let V be a vector space and T a linear operator on V. If 
W is a subspace of V, we say that W is invariant under T if for each vector 
ain W the vector Ta is in W, i.e., if T(W) is contained in W. 


EXAmPLeE 6. If T is any linear operator on V, then V is invariant 
under T, as is the zero subspace. The range of T and the null space of T 
are also invariant under T. 


EXAMPLE 7. Let F bea field and let D be the differentiation operator 
on the space F[z] of polynomials over F. Let n be a positive integer and 
let W be the subspace of polynomials of degree not greater than n. Then W 
is invariant under D. This is just another way of saying that D is ‘degree 
decreasing.’ 


EXAmP_eE 8. Here isa very useful generalization of Example 6. Let T 
be a linear operator on V. Let U be any linear operator on V which com- 
mutes with T, i.e, TU = UT. Let W be the range of U and let N be the 
null space of U. Both W and N are invariant under T. If «æ is in the range 
of U, say a = UB, then Ta = T(UB) = U(TB) so that Ta is in the range 
of U. If a isin N, then U(Ta) = T(Ua) = T(0) = 0; hence, Ta is in N. 

A particular type of operator which commutes with T is an operator 
U = g(T), where g is a polynomial. For instance, we might have U = 
T — cl, where c is a characteristic value of T. The null space of U is 
familiar to us. We see that this example includes the (obvious) fact that 
the space of characteristic vectors of T associated with the characteristic 
value c is invariant under T. 


EXAMPLE 9. Let T be the linear operator on R? which is represented 
in the standard ordered basis by the matrix 


0 -l 
a i of 
Then the only subspaces of fe? which are invariant under T are K? and the 
zero subspace. Any other invariant subspace would necessarily have 
dimension 1. But, if W is the subspace spanned by some non-zero vector a, 
the fact that W is invariant under T means that «œ is a characteristic 
vector, but A has no real characteristic values. 

When the subspace W is invariant under the operator T, then T 
induces a linear operator Tw on the space W. The linear operator Tw is 
defined by Tw(a) = 7a), for a in W, but Tw is quite a different object 
from T since its domain is W not V. 

When V is finite-dimensional, the invariance of W under T has a 


199 


200 


Elementary Canonical Forms Chap. 6 


simple matrix interpretation, and perhaps we should mention it at this 
point. Suppose we choose an ordered basis @ = {ai,..., Qn; for V such 
that ®’ = {a1,...,a,-} is an ordered basis for W (r = dim W). Let A = 
[T}e so that 


n 
Ta; = Ð Aijai 
i=l 


Since W is invariant under T, the vector Ta; belongs to W for 7 < r. This 
means that 


(6-9) Ta; = 2 Asay, j < qT. 


In other words, A; = Oif 7 < randi>r. 
Schematically, A has the block form 


(6-10) A= ki “4 


where B is an rX r matrix, C is an r X (n ~ r) matrix, and D is an 
(n — r) X (n ~ r) matrix. The reader should note that according to 
(6-9) the matrix A is precisely the matrix of the induced operator Ty in 
the ordered basis @’. 

Most often, we shall carry out arguments about T and Ty without 
making use of the block form of the matrix A in (6-10). But we should note 
how certain relations between Ty and T are apparent from that block form. 


Lemma. Let W be an invariant subspace fer T. The characteristic 
polynomial for the restriction operator Tw divides the characteristic polynomial 
for T. The minimal polynomial for Tw divides the minimal polynomial for T. 


Proof. We have 


where A = [T]e and B = [Tw]g. Because of the block form of the matrix 
det (zI — A) = det (xI — B) det (zI — D). 


That proves the statement about characteristic polynomials. Notice that 
we used J to represent identity matrices of three different sizes. 
The kth power of the matrix A has the block form 


B: C 
k 
DE [o A 
where Ck is some r X (n — r} matrix. Therefore, any polynomial which 


annihilates A also annihilates B (and D too). So, the minimal polynomial 
for B divides the minimal polynomial for A. B 


EXAMPLE 10. Let T be any linear operator on a finite-dimensional 
space V. Let W be the subspace spanned by ail of the characteristic vectors 
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of T. Let c,..., c be the distinct characteristic values of T. For each 1, 
let W; be the space of characteristic vectors associated with the charac- 
teristic value c;, and let @; be an ordered basis for W;. The lemma before 
Theorem 2 tells us that @’ = (@i,..., Gx) is an ordered basis for W. In 
particular, 

dim W = dim W, + --- + dim Wy. 


Let @’ = {a1,...,a,} so that the first few a’s form the basis ®:, the next 
few @z, and so on. Then 


Ta; = tia, t=1,...,7 


where (f,...,¢) = (C C1...) Ct). ++ Ceo Ck.. Ce) With c: repeated 
dim W; times. 
Now W is invariant under T, since for each a in W we have 


a = Ya, + e. + rAr 
Ta = hitia + +++ + 42,0, 


Choose any other vectors a,14,..., a, in V such that ® = {a1,..., on} 
is a basis for V. The matrix of T relative to @ has the block form (6-10), and 
the matrix of the restriction operator Tw relative to the basis @’ is 


4h O e O 
Ba 0 by iets o| 
Oo eee k 


The characteristic polynomial of B (i.e., of Tw) is 
g= @—a)* ++ (E — GY 


where e; = dim W;. Furthermore, g divides f, the characteristic polynomial 
for T. Therefore, the multiplicity of ¢; as a root of f is at least dim W. 

All of this should make Theorem 2 transparent. It merely says that T 
is diagonalizable if and only if r = n, if and only if e + ++» te =n. It 
does not help us too much with the non-diagonalizable case, since we don’t 
know the matrices C and D of (6-10). 


Definition. Let W be an invariant subspace for T and let a be a vector 
in V. The T-conductor of a into W is the set Sr(a; W), which consists of 
all polynomials g (over the scalar field) such that g(T)a is n W. 


Since the operator T will be fixed throughout most discussions, we 
shall usually drop the subscript T and write S(@; W). The authors usually 
call that collection of polynomials the ‘stuffer’ (das einstopfende Ideal). 
‘Conductor’ is the more standard term, preferred by those who envision 
a less aggressive operator ¢(T'), gently leading the vector a into W. In the 
special case W = {0} the conductor is called the T-annihilator of a. 
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Lemma. If W is an invariant subspace for T, then W is invariant 
under every polynomial in T. Thus, for each a in V, the conductor S(a; W) is 
an ideal in the polynomial algebra F [x]. 


Proof. If 8 isin W, then T8 is in W. Consequently, T(T8) = T° 
is in W. By induction, T*@ is in W for each k. Take linear combinations to 
see that f(T) is in W for every polynomial f. 

The definition of S(a; W) makes sense if W is any subset of V. If W is 
a subspace, then S(a; W) is a subspace of F[x], because 


(f + 9)(T) = f(T) + gT). 


If W is also invariant under T, let g be a polynomial in S(a; W), i.e., let 
g( T)e be in W. If f is any polynomial, then f(T) [g(T)a] will be in W. Since 


(f9)(T) = f(T)9(T) 


fg is in S(a; W). Thus the conductor absorbs multiplication by any poly- 
nomial. f 


The unique monic generator of the ideal S(a; W) is also called the 
T-conductor of a into W (the 7-annihilator in case W = {0}). The 
T-conductor of a into W is the monic polynomial g of least degree such that 
g(T)a isin W. A polynomial f is in S(a; W) if and only if g divides f. Note 
that the conductor S(a@; W) always contains the minimal polynomial for T; 
hence, every T-conductor divides the minimal polynomial for T. 

As the first illustration of how to use the conductor S(a; W), we shall 
characterize triangulable operators. The linear operator T is called tri- 
angulable if there is an ordered basis in which T is represented by a 
triangular matrix. 


Lemma. Let V be a finite-dimensional vector space over the field F. 
Let T be a linear operator on V such that the minimal polynomial for T is a 
product of linear factors 


p = (kK o) +++ (x — Ge), & in F. 


Let W be a proper (W # V) subspace of V which is invariant under T. There 
exists a vector a in V such that 


(a) a is not in W; 
(b) (T — cl)a isin W, for some characteristic value c of the operator T. 
Proof. What (a) and (b) say is that the T-conductor of a into W 
is a linear polynomial. Let 8 be any vector in V which is not in W. Let g be 


the T-conductor of 8 into W. Then g divides p, the minimal polynomial 
for T. Since 8 is not in W, the polynomial g is not constant. Therefore, 


g = (= a) (E — c)” 
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where at least one of the integers e; is positive. Choose j so that e; > 0. 
Then (x — c;) divides g: 
g = (x — G)h. 
By the definition of g, the vector a = h(T)@ cannot be in W. But 
(T — ea = (T — TAT) 
g(T )B 


isinW. f 


Theorem 5. Let V be a finite-dimensional vector space over the field F 
and let T be a linear operator on V. Then T ts triangulable if and only if the 
minimal polynomial for T is a product of linear polynomials over F. 


Proof. Suppose that the minimal polynomial factors 
p oo (x Tai a)” TE (x w cK)". 


By repeated application of the lemma above, we shall arrive at an ordered 


basis @ = {a,...,@n} in which the matrix representing T is upper- 
triangular: 
Qu Q2 Q3 -** Ain 
0 Q2 QA + An 
(6-11) [T ]e = 10 0 Qg it tt Azn | 
0 0 0 >’ J 
Now (6-11) merely says that 
(6-12) Ta; = Aji +- + Ajj jy 1 < J < n 
that is, Ta; is in the subspace spanned by q,...,a;. To find a,..., Gn, 


we start by applying the lemma to the subspace W = {0}, to obtain the 
vector a;. Then apply the lemma to W,, the space spanned by ai, and we 
obtain œz. Next apply the lemma to W», the space spanned by a and az. 


Continue in that way. One point deserves comment. After a;,..., æ; have 
been found, it is the triangular-type relations (6-12) for 7 =1,...,7 
which ensure that the subspace spanned by œ, ..., œi 1S invariant under 
T: 


If T is triangulable, it is evident that the characteristic polynomial for 
T has the form 
f= (£ a) -o (x> eat, cin F. 


Just look at the triangular matrix (6-11). The diagonal entries au, . . . , Qin 
are the characteristic values, with c; repeated d; times. But, if f can be so 
factored, so can the minimal polynomial p, because it divides f. f 


Corollary. Let F be an algebraically closed field, e.g., the complex num- 
ber field. Every n X n matrix over F is similar over F to a triangular matriz. 
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Theorem 6. Let V be a finite-dimensional vector space over the field F 
and let T be alinear operator on V. Then T is diagonalizable if and only if the 
minimal polynomial for T has the form 


p = (x —q@) -++ (x — ex) 
where cı, . . . , Ck are distinct elements of F. 


Proof. We have noted earlier that, if T is diagonalizable, its 
minimal polynomial is a product of distinct linear factors (see the discussion 
prior to Example 4). To prove the converse, let W be the subspace spanned 
by all of the characteristic vectors of T, and suppose W = V. By the lemma 
used in the proof of Theorem 5, there is a vector a not in W and a charac- 
teristic value c; of T such that the vector 


B= (T — cIa 
lies in W. Since @ is in W, 
B=Pit-:++ 8: 
where T8: = ¢;@;, 1 < i < k, and therefore the vector 
A(T )B = h(cr)Bi + -++ + h(ce)Be 
is in W, for every polynomial h. 
Now p = (x — ¢;)q, for some polynomial q. Also 
q — gle;) = @ — c;)h. 
We have 
Q(T )a — gle)a = h(T)(T — ¢1)a = h(T)6- 
But h(T)@ is in W and, since 
0 = p(T)a = (T — ¢I)q(T a 


the vector q(T)a is in W. Therefore, g(c;)@ is in W. Since a is not in W, we 
have q(c;) = 0. That contradicts the fact that p has distinct roots. J 


At the end of Section 6.7, we shall give a different proof of Theorem 6. 
In addition to being an elegant result, Theorem 6 is useful in a computa- 
tional way. Suppose we have a linear operator T, represented by the matrix 
A in some ordered basis, and we wish to know if T is diagonalizable. We 
compute the characteristic polynomial f. If we can factor f: 


fee a) < (t — c) 


we have two different methods for determining whether or not T is diago- 
nalizable. One method is to see whether (for each 7) we can find d; inde- 
pendent characteristic vectors associated with the characteristic value c:. 
The other method is to check whether or not (T — cI) --- (T — ccf) is 
the zero operator. 

Theorem 5 provides a different proof of the Cayley-Hamilton theorem. 
That theorem is easy for a triangular matrix. Hence, via Theorem 5, we 
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obtain the result for any matrix over an algebraically closed field. Any 
field is a subfield of an algebraically closed field. If one knows that result, 
one obtains a proof of the Cayley-Hamilton theorem for matrices over any 
field. If we at least admit into our discussion the Fundamental Theorem of 
Algebra (the complex number field is algebraically closed), then Theorem 5 
provides a proof of the Cayley-Hamilton theorem for complex matrices, 
and that proof is independent of the one which we gave earlier. 


Exercises 


1. Let T be the linear operator on R?, the matrix of which in the standard ordered 


basis is 
1 —1 
qr [2 2] 


(a) Prove that the only subspaces of R? invariant under T are R? and the 
zero subspace. 

(b) If U is the linear operator on C2, the matrix of which in the standard 
ordered basis is A, show that U has 1-dimensional invariant subspaces. 


2. Let W be an invariant subspace for T. Prove that the minimal polynomial 
for the restriction operator Tw divides the minimal polynomial for T, without 
referring to matrices. 


3. Let c be a characteristic value of T and let W be the space of characteristic 
vectors associated with the characteristic value c. What is the restriction opera- 


tor Tw? 
0 1 0 
A=|2 —2 2} 
2 —3 2 


4. Let 
Is A similar over the field of real numbers to a triangular matrix? If so, find such a 
triangular matrix. 


5. Every matrix A such that A? = A is similar to a diagonal matrix. 


6. Let T be a diagonalizable linear operator on the n-dimensional vector space V, 
and let W be a subspace which is invariant under T. Prove that the restriction 
operator Tw is diagonalizable. 


7, Let T be a linear operator on a finite-dimensional vector space over the field 
of complex numbers. Prove that T is diagonalizable if and only if T is annihilated 
by some polynomial over C which has distinct roots. 


8. Let T be a linear operator on V. If every subspace of V is invariant under T, 
then T is a scalar multiple of the identity operator. 


9. Let T be the indefinite integral operator 


(TA) = [EIO ae 
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on the space of continuous functions on the interval [0, 1]. Is the space of poly- 
nomial functions invariant under T? The space of differentiable functions? The 
space of functions which vanish at z = 4? 
10. Let A bea3 X 3 matrix with real entries. Prove that, if A isnot similar over R 
to a triangular matrix, then / is similar over C to a diagonal matrix. 
11. True or false? If the triangular matrix A is similar to a diagonal matrix, then 
A is already diagonal. 
12. Let T be a linear operator on a finite-dimensional vector space over an alge- 
braically closed field F. Let f be a polynomial over F. Prove that ¢ is a character- 
istic value of f(T) if and only if e = f(t), where ¢ is a characteristic value of T. 
13. Let V be the space of n X n matrices over F. Let A be a fixed n X n matrix 
over F. Let T and U be the linear operators on V defined by 
T(B) = AB 
U(B) = AB — BA. 
(a) True or false? If A is diagonalizable (over F), then T is diagonalizable. 
(b) True or false? If A is diagonalizable, then U is diagonalizable. 


6.5. Simultaneous Triangulation; 
Simultaneous Diagonalization 


Let V be a finite-dimensional space and let F be a family of linear 
operators on V. We ask when we can simultaneously triangulate or diago- 
nalize the operators in 5, i.e., find one basis @ such that all of the matrices 
[T]x, T in &, are triangular (or diagonal). In the case of diagonalization, it 
is necessary that F be a commuting family of operators: UT = TU for all 
T, U in F. That follows from the fact that all diagonal matrices commute. 
Of course, it is also necessary that each operator in F be a diagonalizable 
operator. In order to simultaneously triangulate, each operator in f must 
be triangulable. It is not necessary that 5 be a commuting family ; however, 
that condition is sufficient for simultaneous triangulation (if each T can be 
individually triangulated). These results follow from minor variations of 
the proofs of Theorems 5 and 6. 

The subspace W is invariant under (the family of operators) & if 
W is invariant under each operator in $. 


Lemma. Let F be a commuting family of triangulable linear operators 
on V. Let W be a proper subspace of V which is invariant under $. There 
exists a vector ain V such that 


(a) a is notin W; 
(b) for each T in 5, the vector Ta is in the subspace spanned by a and W. 


Proof. It is no loss of generality to assume that F contains only a 
finite number of operators, because of this observation. Let {T.,..., Ta} 
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be a maximal linearly independent subset of 5, i.e., a basis for the subspace 
spanned by S. If a is a vector such that (b) holds for each 7, then (b) will 
hold for every operator which is a linear combination of T,..., Ta 

By the lemma before Theorem 5 (this lemma for a single operator), we 
can find a vector £: (not in W) and a scalar ¢; such that (Ti — cJ)@1is in W. 
Let Vı be the collection of all vectors 8 in V such that (Tı — cf) isin W. 
Then V; is a subspace of V which is properly larger than W. Furthermore, 
V: is invariant under S, for this reason. If T commutes with Tı, then 


(Ti — af)(TB) = T(T, — cl). 


If 8 isin V, then (Tı — cf) is in W. Since W is invariant under each T in 
F, we have T(T; — &DB in W, i.e., TB in Vi, for all 8 in V; and all T in S. 

Now W is a proper subspace of V;. Let U2 be the linear operator on Vi 
obtained by restricting T, to the subspace Vi. The minimal polynomial for 
U: divides the minimal polynomial for Tə. Therefore, we may apply the 
lemma before Theorem 5 to that operator and the invariant subspace W. 
We obtain a vector #2 in V; (not in W) and a scalar cz such that (Tz — cel) Be 
is in W. Note that 


(a) 62 is not in W; 
(b) (Tı — af)B. isin W; 
(c) (Ta — eof) Bo isin W. 


Let V, be the set of all vectors 8 in Vi; such that (T2 — cof)8 is in W. 
Then V; is invariant under f. Apply the lemma before Theorem 5 to Us, 
the restriction of T; to Vz. If we continue in this way, we shall reach a 
vector a = 8, (not in W) such that (T; — ca isinW,j =1,...,r J 


Theorem 7. Let V be a finite-dimensional vector space over the field F. 
Let F be a commuting family of triangulable linear operators on V. There exists 
an ordered basis for V such that every operator in $ is represented by a triangu-~ 
lar matrix in that basis. 


Proof. Given the lemma which we just proved, this theorem has 
the same proof as does Theorem 5, if one replaces T by 5. ff 


Corollary. Let F be a commuting family of n X n matrices over an 
algebraically closed field F. There exists a non-singular n X n matrix P with 
entries in F such that PAP ts upper-triangular, for every matrix A in S. 


Theorem 8. Let F be a commuting family of diagonalizable linear 
operators on the finite-dimensional vector space V. There exists an ordered basis 
for V such that every operator in F is represented in that basis by a diagonal 
matrix. 


Proof. We could prove this theorem by adapting the lemma 
before Theorem 7 to the diagonalizable case, just as we adapted the lemma 
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before Theorem 5 to the diagonalizable case in order to prove Theorem 6. 
However, at this point it is easier to proceed by induction on the dimension 
of V. 

If dim V = 1, there is nothing to prove. Assume the theorem for 
vector spaces of dimension less than n, and let V be an n-dimensional space. 
Choose any T in § which is not a scalar multiple of the identity. Let 
C1,..., C be the distinct characteristic values of T, and (for each 2) let W; 
be the null space of T — c,J. Fix an index i. Then W; is invariant under 
every operator which commutes with T. Let $; be the family of linear 
operators on W; obtained by restricting the operators in F to the (invariant) 
subspace W;. Each operator in F; is diagonalizable, because its minimal 
polynomial divides the minimal polynomial] for the corresponding operator 
in $. Since dim W; < dim V, the operators in 5; can be simultaneously 
diagonalized. In other words, W; has a basis ®; which consists of vectors 
which are simultaneously characteristic vectors for every operator in $.. 

Since T is diagonalizable, the lemma before Theorem 2 tells us that 
@ = (Gi, ..., Gx) is a basis for V. That is the basis we seek. J 


Exercises 


l. Find an invertible real matrix P such that P-!4P and P-'BP are both diago- 
nal, where 4 and B are the real matrices 


(a) 4=[) ll B= |5 z] 


o afi} eeg 


2. Let F be a commuting family of 3 X 3 complex matrices. How many linearly 
independent matrices can Ș contain? What about the n X n case? 


3. Let T be a linear operator on an n-dimensional space, and suppose that T 
has n distinct characteristic values. Prove that any linear operator which commutes 
with T is a polynomial in T. 


4. Let A, B, C, and D be n X n complex matrices which commute. Let E be the 
2n X 2n matrix 
A B 
z=-[6 al 


Prove that det F = det (AD — BC). 


5. Let F be a field, n a positive integer, and let V be the space of n X n matrices 
over F. If A is a fixed n X n matrix over F, let T4 be the linear operator on V 
defined by T4(B) = AB — BA. Consider the family of linear operators Ta ob- 
tained by letting A vary over all diagonal matrices. Prove that the operators in 
that family are simultaneously diagon: iizable. 
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6.6. Direct-Sum Decompositions 


As we continue with our analysis of a single linear operator, we shall 
formulate our ideas in a slightly more sophisticated way—less in terms of 
matrices and more in terms of subspaces. When we began this chapter, we 
described our goal this way: To find an ordered basis in which the matrix 
of T assumes an especially simple form. Now, we shall describe our goal 
as follows: To decompose the underlying space V into a sum of invariant 
subspaces for T' such that the restriction operators on those subspaces are 
simple. 


Definition. Let Wi, ..., Wx be subspaces of the vector space V. We 
say that Wi, ..., Wx are independent 2f 
a +--+ taxr=0, ai n Wi 


implies that each a; îs 0. 


For k = 2, the meaning of independence is {0} intersection, i.e., Wi 
and Wz are independent if and only if Wı Ñ W: = {0}. If k > 2, the 
independence of W,,..., Wz says much more than Wif)--- AW: = 
{0}. It says that each W; intersects the sum of the other subspaces W: 
only in the zero vector. 

The significance of independence is this. Let W = W,+ --- + Wk 
be the subspace spanned by W,,..., Wx. Each vector a in W can be 
expressed as a sum 


a = a t e + ak aiin Wi. 
If Wi, . .., Wx are independent, then that expression for a is unique; for if 
a = fi + + br Bi in W: 


then 0 = (a — b1) + --: + (ax — br), hence a; ~ 8; = 0,1 =1,...,k. 
Thus, when W3, . . ., W; are independent, we can operate with the vectors 
in W as k-tuples (a, ..., a), a; in W;, in the same way as we operate with 
vectors in R* as k-tuples of numbers. 


Lemma. Let V be a finite-dimenstonal vector space. Let Wi, ..., Wx 
be subspaces of Vand let W = Wy, + -:- + Ws. The following are equivalent. 


(a) Wi,..., Wx are independent. 
(b) For each j, 2 < j < k, we have 


Wi (Wit -+ Win) = {0}. 


(c) If ®; is an ordered basis for Wi, 1 < i < k, then the sequence @ = 
(Guy - . . , Gx) ts an ordered basis for W. 
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Proof. Assume (a). Let æ be a vector in the intersection W; N 
(Wi+ -> + Wi). Then there are vectors a,...,a;-1 with a; in W; 
such that a = a1 + -+ + ajz. Since 


a+ e +ajit+(-a)+0+---+0=0 


and since W,,..., Wx are independent, it must be that a1 = œa: = +--+ = 
Qj- 7 q&a = 0. 
Now, let us observe that (b) implies (a). Suppose 


O=a+--- +a, a; in Wi. 
Let j be the largest integer i such that a; = 0. Then 
0 = a +--- + ay, a; #0. 


Thus a; = —a; — +--+ — aj is a non-zero vector in W; ÑN (Wi +--+ + 
W;.-1). 

Now that we know (a) and (b) are the same, let us see why (a) is 
equivalent to (c). Assume (a). Let @; be a basis for W; 1 < i < k, and let 
G = (&,..., G). Any linear relation between the vectors in @ will have 
the form 


Bi+ +++ +B, = 0 


where @; is some linear combination of the vectors in ®;. Since Wi, ..., Wz 
are independent, each £; is 0. Since each @; is independent, the relation we 
have between the vectors in @ is the trivial relation. 

We relegate the proof that (c) implies (a) to the exercises (Exercise 


2). I 


If any (and hence all) of the conditions of the last lemma hold, we 
say that the sum W = W, + --- + Wz is direct or that W is the direct 
sum of W,,..., Wi and we write 


W=Wi®@:- OW, 


In the literature, the reader may find this direct sum referred to as an 
independent sum or the interior direct sum of Wi,..., Ws. 


ExamPLE 11. Let V be a finite-dimensional vector space over the field 
F and let {a1,...,an} be any basis for V. If W; is the one-dimensional 
subspace spanned by a, then V = Wi@--- @ Wa. 


EXAMPLE 12, Letn be a positive integer and F a subfield of the com- 
plex numbers, and let V be the space of all n X n matrices over F. Let 
W, be the subspace of all symmetric matrices, i.e., matrices A such that 
At =A. Let W: be the subspace of all skew-symmetric matrices, i.e., 
matrices A such that A‘ = —A. Then V = Wi@ W.. If A is any matrix 
in V, the unique expression for A as a sum of matrices, one in W, and the 
other in Wa, is 
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A=A,+ Ag 
A; = 3(A + A’) 
A, = 3(A — A’). 

EXAMPLE 13. Let T be any linear operator on a finite-dimensional 
space V. Let ci,..., ce be the distinct characteristic values of T, and let 
W; be the space of characteristic vectors associated with the characteristic 
value c;. Then W,,..., Wz are independent. See the lemma before Theo- 


rem 2. In particular, if T is diagonalizable, then V = Wi@--- @ W.. 


Definition. If V is a vector space, a projection of V is a linear 
operator E on V such that E? = E. 


Suppose that E is a projection. Let R be the range of E and let N be 
the null space of E. 


1. The vector £ is in the range R if and only if £8 = £. If 8 = Ea, 
then E8 = E?a = Ea = $. Conversely, if 8 = EB, then (of course) £ is in 
the range of E. 

2V=RON. 

3. The unique expression for a as a sum of vectors in R and N is 
a = Ea + (a — Ea). 


From (1), (2), (3) it is easy to see the following. If R and N are sub- 
spaces of V such that V = R @N, there is one and only one projection 
operator E which has range R and null space N. That operator is called the 
projection on F along N. 

Any projection EF is (trivially) diagonalizable. If {œ,..., œ} is a 
basis for R and {ay4,..., @n} a basis for N, then the basis G = {ai,...,; 


ap} diagonalizes E: 
„n [FO 


where J is the r X r identity matrix. That should help explain some of the 
terminology connected with projections. The reader should look at various 
cases in the plane R? (or 3-space, R?), to convince himself that the projec- 
tion on R along N sends each vector into R by projecting it parallel to N. 

Projections can be used to describe direct-sum decompositions of the 
space V. For, suppose V = W,;@ --- @ Wi. For each j we shall define 
an operator E; on V. Let a be in V, saya = ai + ++- + a witha; in Wi. 
Define Eja = a;. Then E; is a well-defined rule. It is easy to see that E; is 
linear, that the range of E; is Wj, and that E? = Ej. The null space of E; 
is the subspace 


(Wit + Wyat Win to + We) 


for, the statement that E;a = 0 simply means a; = 0, i.e., that a is actually 
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a sum of vectors from the spaces W; with 7 = j. In terms of the projections 
E; we have 
(6-13) a= Fa + -:: + Era 
for each a in V. What (6-13) says is that 

I= Eit > + Er 


Note also that if 7 = j, then E:F; = 0, because the range of E; is the 
subspace W; which is contained in the null space of EL. We shall now 
summarize our findings and state and prove a converse. 


Theorem 9. If V = Wi@--- ® Wx, then there exist k linear opera- 
tors Fn, ..., Ex on V such that 


(i) each FE; is a projection (E? = Ei); 
(i1) KE; = 0, if i pa Jo 
(iii) I = Ey + --- + Ex; 
(iv) the range of E; is Wi. 


Conversely, if Ey, . . . , Ex are k linear operators on V which satisfy conditions 
(i), (ii), and (iii), and if we let W; be the range of Ei, then V = Wi®---@ 
Wx. 


Proof. We have only to prove the converse statement. Suppose 
£,,..., Er are linear operators on V which satisfy the first three condi- 
tions, and let W; be the range of E;. Then certainly 


V=W + + Wi; 
for, by condition (iii) we have 
a = Ea + :- + Era 
for each a in V, and Fia is in W;. This expression for a is unique, because if 
a =a te tax 


with a: in W, say a; = E;ß;, then using (i) and (ii) we have 


k 
Eja = D E jai 


i=1 


k 
2 EL; EB; 
= E5B; 
= Ep; 
= Qj. 


This shows that V is the direct sum of the W;. J 
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Exercises 


l. Let V be a finite-dimensional vector space and let W, be any subspace of V. 
Prove that there is a subspace W, of V such that V = W, @ W2. 


2. Let V be a finite-dimensional vector space and let Wi, ..., Wi be subspaces 
of V such that 
V=W,t+-::-+W, and dim V = dim W, + --- + dim W,. 
Prove that V = Wi®-:: © Wi. 


3. Find a projection Æ which projects R? onto the subspace spanned by (1, —1) 
along the subspace spanned by (1, 2). 


4. If E, and E: are projections onto independent subspaces, then E, + E? is a 
projection. True or false? 


5. If E is a projection and f is a polynomial, then f(E) = al + bE. What are 
a and b in terms of the coefficients of f? 

6. True or false? If a diagonalizable operator has only the characteristic values 
0 and 1, it is a projection. 

7. Prove that if # is the projection on R along N, then (I — E) is the projection 
on N along R. 


8. Let E, . . . , Er be linear operators on the space V such that Ei + --- + Er = I. 
(a) Prove that if H,£; = 0 fort # j, then E? = F; for each i. 
(b) In the case k = 2, prove the converse of (a). That is, if E, + E: = I and 
E = F, E3 = Ey, then EE, = 0. 
9. Let V be a real vector space and E an idempotent linear operator on V, i.e., 
a projection. Prove that (J + E) is invertible. Find (J + E). 


10. Let F be a subfield of the complex numbers (or, a field of characteristic zero). 
Let V be a finite-dimensional vector space over F. Suppose that Fi, ..., Ek 
are projections of V and that A, + --- + Er = I. Prove that E;E; = 0 fori #7 
(Hint: Use the trace function and ask yourself what the trace of a projection is.) 
11. Let V be a vector space, let Wi, ..., Wi be subspaces of V, and let 

Vis Wite + Wint Wint +++ + We. 


Suppose that V = W,@--: ® Wr. Prove that the dual space V* has the direct- 
sum decomposition V* = V?@--- @ Vi. 
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We are primarily interested in direct-sum decompositions V = 
W,@ --- @ Wy, where each of the subspaces W; is invariant under some 
given linear operator 7. Given such a decomposition of V, T induces a 
linear operator T; on each W; by restriction. The action of T is then this. 
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If æ is a vector in V, we have unique vectors a, ..., a; with a; in W; such 
that 

aay fees t a 
and then 


Ta = Tia; + Soh + Trax. 


We shall describe this situation by saying that T is the direct sum of the 
operators Tı,..., T+ It must be remembered in using this terminology 
that the T; are not linear operators on the space V but on the various 
subspaces W;. The fact that V = W: @ --- @ W, enables us to associate 
with each a in V a unique k-tuple (a1, . . . , œx) of vectors a; in W; (by a = 
a + +--+ + ar) in such a way that we can carry out the linear operations 
in V by working in the individual subspaces W,. The fact: that each W; is 
invariant under T enables us to view the action of T as the independent 
action of the operators 7’; on the subspaces W;. Our purpose is to study T 
by finding invariant direct-sum decompositions in which the T; are opera- 
tors of an elementary nature. 

Before looking at an example, let us note the matrix analogue of this 
situation. Suppose we select an ordered basis @; for each W, and let @ 
be the ordered basis for V consisting of the union of the @; arranged in 
the order @1,..., Gr, so that @ is a basis for V. From our discussion 
concerning the matrix analogue for a single invariant subspace, it is easy 
to see that if A = [T]e and A; = [Tilg, then A has the block form 


A, 0 ++: 0 
(6-14) A= : a Bi 7 
0 0 e A 


In (6-14), A; is a d; X d; matrix (d; = dim W;), and the 0’s are symbols 
for rectangular blocks of scalar 0’s of various sizes. It also seems appro- 
priate to describe (6-14) by saying that A is the direct sum of the matrices 
Ay, .. +, Ár. 

Most often, we shall describe the subspace W; by means of the associ- 
ated projections Æ; (Theorem 9). Therefore, we need to be able to phrase 
the invariance of the subspaces W; in terms of the E;. 


Theorem 10. Let T be a linear operator on the space V, and let 
W, ..., Wx and Fi,..., Ex be as in Theorem 9. Then a necessary and 
sufficient condition that each subspace W; be invariant under T is that T 
commute with each of the projections Ui, i.e., 


TE; = ET, i=1,...,k. 
Proof. Suppose T commutes with each E;. Let a be in W;. Then 


Eja = a, and 
Ta 


T (Ej) 
E;(Ta) 
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which shows that Ta is in the range of L; i.e., that W; is invariant under T. 
Assume now that each W; is invariant under T. We shall show that 
TE; = E;T. Let œ be any vector in V. Then 


a= Mat --- + Era 
Ta = That--) + TE. 


Since Lœ is in W, which is invariant under T, we must have T(Lia) = 
ER: for some vector £;. Then 


ETE: = E;E:ßi 


„JO if ty 
E;8;, if i= J. 
Thus : 
ETa = ET Ew + +--+ +E TEx 
= Ebi 
= TE;a. 


This holds for each a in V, sc ET = TE; J 


We shall now describe a diagonalizable operator T in the language of 
invariant direct sum decompcsitions (projections which commute with T). 
This will be a great help to us in understanding some deeper decomposition 
theorems later. The reader may feel that the description which we are 
about to give is rather complicated, in comparison to the matrix formula- 
tion or to the simple statement that the characteristic vectors of T span the 
underlying space. But, he should bear in mind that this is our first glimpse 
at a very effective method, by means of which various problems concerned 
with subspaces, bases, matrices, and the like can be reduced to algebraic 
calculations with linear operators. With a little experience, the efficiency 
and elegance of this method of reasoning should become apparent. 


Theorem 11. Let T be alinear operator on a finite-dimensional space V. 
If T is diagonalizable and if c1,...,¢x are the distinct characteristic 
values of T, then there exist linear operators Ey, . . . , Ex on V such that 


(i) T=&E t- + ekek; 
Gi) I = E+ o + By 
(iii) EE; = 0,1 Æj; 
(iv) E? = E; (E; is a projection); 
(v) the range of E; is the characteristic space for T associated with ci. 


Conversely, if there exist k distinct scalars c,...,¢x and k non-zero 
linear operators I, ..., Ex which satisfy conditions (i), (ii), and (iii), then 
T is diagonalizable, cı, . . . , Cx are the distinct characteristic values of T, and 


conditions (iv) and (v) are satisfied also. 


Proof. Suppose that T is diagonalizable, with distinct charac- 


215 


216 





Elementary Canonical Forms Chap. 6 


teristic values c,..., cx. Let W; be the space of characteristic vectors 
associated with the characteristic value c;. As we have seen, 
V=Wi@--- Ow: 


Let E,,..., Ex be the projections associated with this decomposition, as 
in Theorem 9. Then (ii), (iii), (iv) and (v) are satisfied. To verify (i), 
proceed as follows. For each a in V, 


Ea + +++ + Era 


a 
and so 
Ta = TE + --- + TE 


alia + Ree + Crna. 


In other words, T = afi + +++: + Er. 

Now suppose that we are given a linear operator T along with distinct 
scalars c; and non-zero operators E; which satisfy (i), (ii) and (iii). Since 
EE; = 0 when i # J, we multiply both sides of I = E + --- + Ex by 
E; and obtain immediately E? = E;. Multiplying T = al, + +- + Ek 
by E; we then have TE; = c;E;, which shows that any vector in the range 
of E; is in the null space of (T — c,J). Since we have assumed that E; = 0, 
this proves that there is a non-zero vector in the null space of (T — c.f), 
i.e., that c; is a characteristic value of T. Furthermore, the c; are all of the 
characteristic values of T; for, if c is any scalar, then 


T — cl = (a —OFit +++ + (oe ~ cE: 


so if (T — cl)a = 0, we must have (c: — c)Eia = 0. If a is not the zero 
vector, then FE;a ~ 0 for some î, so that for this 7 we have c; — c = 0. 
Certainly T is diagonalizable, since we have shown that every non- 
zero vector in the range of E; is a characteristic vector of T, and the fact 
that J = Eı + --- + Er shows that these characteristic vectors span V. 
All that remains to be demonstrated is that the null space of (T — cf) is 
exactly the range of #;. But this is clear, because if Ta = cia, then 


1 


(c; — ¢:)E;a = 0 


n 
LM 


2 
hence 
(c; — c)E;a = 0 for each j 
and then 
E;a = 0, j At. 


Since a = Eia + --- + Era, and Eja = 0 for j # i, we have a = Eia, 
which proves that a is in the range of E; jj 


One part of Theorem 9 says that for a diagonalizable operator T, 
the scalars cı, .. ., cx and the operators Ei, ..., Es are uniquely deter- 
mined by conditions (i), (ii), (iii), the fact that the c: are distinct, and 
the fact that the E; are non-zero. One of the pleasant features of the 
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decomposition T = aki + -++ + &E is that if g is any polynomial over 
the field F, then 

G(T) = ger) Bi + +++ + gler) Ei. 


We leave the details of the proof to the reader. To see how it is proved one 
need only compute T” for each positive integer r. For example, 


k k 
T? = D ol; D GE; 
i=l j=l 


k 


k 
È oc; LiL; 
1j=1 


i= 


k 
È cE? 
=1 


k 
= D cK. 
i=1 
The reader should compare this with g(A) where A is a diagonal matrix; 
for then g(A) is simply the diagonal matrix with diagonal entries g(Au), 


., glAnn). 
We should like in particular to note what happens when one applies 
the Lagrange polynomials corresponding to the scalars c1, . . . , C: 


o ep ee) 
P3 L, (c; — c;) 


We have p;(c:) = ôi which means that 
k 
p(T) = È 63h; 

i=l 

= Kj. 
Thus the projections Æ; not only commute with T but are polynomials in 

T. 

Such calculations with polynomials in T can be used to give an 
alternative proof of Theorem 6, which characterized diagonalizable opera- 
tors in terms of their minimal polynomials. The proof is entirely inde- 


pendent of our earlier proof. 
If T is diagonalizable, T = oi, + -++ + c.dé,, then 


g(T) = gler)Ey + +++ + 9(ce) Ex 


for every polynomial g. Thus g(T) = 0 if and only if g(c:) = 0 for each i. 
In particular, the minimal polynomial for T is 


p = (z —a)--: (£ — c). 


Now suppose T is a linear operator with minimal polynomial p = 
(£ — cı) +--+ (£ — c), where c, . . . , cx are distinct elements of the scalar 
field. We form the Lagrange polynomials 
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ie (x = Ci). 
4 u (c; — ci) 
We recall from Chapter 4 that p,(c;) = 6,; and for any polynomial g of 
degree less than or equal to (k — 1) we have 


g = gler)pi + +++ + g(ce)pe- 
Taking g to be the scalar polynomial 1 and then the polynomial zx, we have 
1= pt: + De 
(6-15) t = CPi + pone + CkPk- 
(The astute reader will note that the application to z may not be valid 
because k may be 1. But if k = 1, T is a scalar multiple of the identity and 
hence diagonalizable.) Now let E; = p;(T). From (6-15) we have 
P=8 +++ +E 
T = àE + R + cpp 
Observe that if i = 7, then p,p; is divisible by the minimal polynomial p, 
because p;p; contains every (x — c,) as a factor. Thus 
(6-17) E:E; = 9, tT ÆJ. 
We must note one further thing, namely, that E; = 0 for each 7. This 
is because p is the minimal polynomial for T and so we cannot have 
p:(T) = 0 since p; has degree less than the degree of p. This last comment, 
together with (6-16), (6-17), and the fact that the c; are distinct enables us 
to apply Theorem 11 to conclude that T is diagonalizable. J 


(6-16) 


Exercises 


l. Let E be a projection of V and let T be a linear operator on V. Prove that the 
range of E is invariant under T if and only if ETE = TE. Prove that both the 
range and null space of E are invariant under T if and only if ET = TE. 


2. Let T be the linear operator on R?, the matrix of which in the standard ordered 
basis is 
F 1 
0 2 


Let W: be the subspace of R? spanned by the vector e& = (1, 0). 

(a) Prove that W, is invariant under T, 

(b) Prove that there is no subspace W, which is invariant under T and which 
is complementary to Wi: 


R? = W, @ Wo. 
(Compare with Exercise 1 of Section 6.5.) 


3. Let T be a linear operator on a finite-dimensional vector space V. Let R be 
the range of T and let N be the null space of T. Prove that R and N are inde- 
pendent if and only if V = R ỌN. 
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4. Let T be a linear operator on V. Suppose V = W, ® -> @® Wi, where each 

W; is invariant under T. Let T; be the induced (restriction) operator on W;. 

(a) Prove that det (T) = cet (Tı) --- det (Ta). 

(b) Prove that the characteristic polynomial for f is the product of the charac- 
teristic polynomials for fi,..., fr- 

(c) Prove that the minimal polynomial for T is the least common multiple 
of the minimal polynomials for Tı, ..., Ta. (Hint: Prove and then use the cor- 
responding facts about direct sums of matrices.) 


5. Let T be the diagonalizable linear operator on R’ which we discussed in 
Example 3 of Section 6.2. Use the Lagrange polynomials to write the representing 
matrix A in the form A = Fy + 2E,, E, + E: = 1, EE, = 0. 


6. Let A be the 4 X 4 matrix in Example 6 of Section 6.3. Find matrices E, Ez, Es 
such that A = aE; + eE: + E; Hy + E: + E; = I, and EE; = 0, i ¥ j. 


7. In Exercises 5 and 6, notice that (for each 7) the space of characteristic vectors 
associated with the characteristic value c; is spanned by the column vectors of the 
various matrices E; with j # 7. Is that a coincidence? 


8. Let T be a linear operator on V which commutes with every projection operator 
on V. What can you say about T? 


9. Let V be the vector space of continuous real-valued functions on the interval 
[—1, 1] of the real line. Let W. be the subspace of even functions, f(—z) = f(z), 
and let W. be the subspace of odd functions, f(—z) = —f(z). 

(a) Show that V = We Q Wo. 
(b) If T is the indefinite integral operator 


(Ma) = rod 


are W, and W, invariant under T? 


6.8. The Primary Decomposition Theorem 


We are trying to study a linear operator T on the finite-dimensional 
space V, by decomposing T into a direct sum of operators which are in 
some sense elementary. We can do this through the characteristic values 
and vectors of T in certain special cases, i.e., when the minimal polynomial 
for T factors over the scalar field F into a product of distinct monic poly- 
nomials of degree 1. What can we do with the general T? If we try to study 
T using characteristic values, we are confronted with two problems. First, 
T may not have a single characteristic value; this is really a deficiency in 
the scalar field, namely, that it is not algebraically closed. Second, even if 
the characteristic polynomial factors completely over F into a product of 
polynomials of degree 1, there may not be enough characteristic vectors for 
T to span the space V; this is clearly a deficiency in T. The second situation 
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is illustrated by the operator T on F? (F any field) represented in the 
standard basis by 


2 0 0 
A=]1 2 0}; 
00 -l 


The characteristic polynomial for A is (x — 2)?(x + 1) and this is plainly 
also the minimal polynomial for A (or for T). Thus T is not diagonalizable. 
One sees that this happens because the null space of (T ~ 27) has dimen- 
sion 1 only. On the other hand, the null space of (T + J) and the null space 
of (T — 2I)? together span V, the former being the subspace spanned by 
«e and the latter the subspace spanned by e and ee. 
This will be more or less our general method for the second problem. 
If (remember this is an assumption) the minimal polynomial for T de- 
composes 
p= (x sas a)" oe (er Cx)T 


where c,..., cx are distinct elements of F, then we shall show that the 
space V is the direct sum of the null spaces of (T — cI)", i =1,...,k. 
The hypothesis about p is equivalent to the fact that T is triangulable 
(Theorem 5); however, that knowledge will not help us. 

The theorem which we prove is more general than what we have 
described, since it works with the primary decomposition of the minimal 
polynomial, whether or not the primes which enter are all of first degree. 
The reader will find it helpful to think of the special case when the primes 
are of degree 1, and even more particularly, to think of the projection-type 
proof of Theorem 6, a special case of this theorem. 


Theorem 12 (Primary Decomposition Theorem). Let T be a linear 
operator on the finite-dimensional vector space V over the field F. Let p be the 
minimal polynomial for T, 

p= pis: pe 
where the pi are distinct irreducible monic polynomials over F and the r; are 
positive integers. Let W; be the null space of pi(T)", i = 1,...,k. Then 


G V=W@--- OW; 
(ii) each W i is invariant under T; 
Gii) af T; is the operator induced on W; by T, then the minimal poly- 
nomial for T; is př. 


Proof. The idea of the proof is this. If the direct-sum decomposi- 
tion (i) is valid, how can we get hold of the projections #1, . . . , Ep associ- 
ated with the decomposition? The projection E; will be the identity on W; 
and zero on the other W;. We shall find a polynomial h; such that hi(T) is 
the identity on W; and is zero on the other W;, and so that hi(T) + --- + 
hi(T) = I, ete. 
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For each 2, let 


fiz Ps = II 
f pi j#i Pi 
Since pı, . . . , px are distinct prime polynomials, the polynomials fi, . . . , fi 


are relatively prime (Theorem 10, Chapter 4). Thus there are polynomials 
gi) ++») gk such that 
z fgi = 


i=l 

Note also that if 7 = j, then fif; is divisible by the polynomial p, because 

Jif; contains each pm as a factor. We shall show that the polynomials 

h; = fg: behave in the manner described in the first paragraph of the proof. 
Let L; = h:(T) = f:(T)J:(T). Since hk + --- +h, = 1 and p divides 

fif; for i # j, we have 


PE e ig -+h= 
j = 0, 


Thus the E; are projections which correspond to some direct-sum de- 
composition of the space V. We wish to show that the range of FE; is exactly 
the subspace W.. It is clear that each vector in the range of E; is in W,, for 
if a is in the range of E; then a = Fa and so 


p(T)a = p(T) Eia 
= p:(TYf:(T)g:(T)a 


because p"f;g; is divisible by the minimal polynomial p. Conversely, 
suppose that « is in the null space of p(T)". If j = i, then f;g; is divisible 
by p; and so f;(T)g;(T)a = 0, i.e., Eja = 0 for j # i. But then it is im- 
mediate that Ha = a, i.e., that a is in the range of E; This completes the 
proof of statement (i). 

It is certainly clear that the subspaces W; are invariant under T. 
If T; is the operator induced on W; by T, then evidently p,(T;)" = 0, 
because by definition p:(T)" is 0 on the subspace W,. This shows that the 
minimal polynomial for T; divides pi. Conversely, let g be any polynomial 
such that g(T;) = 0. Then g(T)f:(T) = 0. Thus gf; is divisible by the 
minimal polynomial p of T, i.c., pif: divides gf: It is easily seen that pi 
divides g. Hence the minimal polynomial for T; is př. I 


Corollary. If Ei,.. ., Ey are the projections associated with the primary 
decomposition of T, then each E; is a polynomial in T, and accordingly if a 
linear operator U commutes with T then U commutes with each of the Ei, i.e., 
each subspace W; is invariant under U. 


In the notation of the proof of Theorem 12, let us take a look at the 
special case in which the minimal polynomial for T is a product of first- 
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degree polynomials, i.e., the case in which each p; is of the form 
pi = x — ci. Now the range of E; is the null space W; of (T — cil)". 
Let us put D = aE, + --- + eH By Theorem 11, D is a diagonal- 
izable operator which we shall call the diagonalizable part of T. Let us 
look at the operator N = T — D. Now 


T=TE, +--+ + TE 
D = by + +++ + chy 
SO 
N = (T — ol)Ei tees + (T — el) Ex. 


The reader should be familiar enough with projections by now so that he 
sees that 


N? = (T — oJ YE, + e+) + (T — aI PE 
and in general that 
N= (T — ql) hi + +++ + (T — cl) Ey. 


When r > r; for each 7, we shall have N* = 0, because the operator 
(T — cI) will then be 0 on the range of E;. 


Definition. Let N be a linear operator on the vector space V. We say 
that N is nilpotent if there is some positive integer r such that Nt = 0. 


Theorem 13. Let T be a linear operator on the finite-dimensional vector 
space V over the field F. Suppose that the minimal polynomial for T de- 
composes over F into a product of linear polynomials. Then there is a diago- 
nalizable operator D on V and a nilpotent operator N on V such that 


(i) T=D+N, 
(ii) DN = ND 


The diagonalizable operator D and the nilpotent operator N are uniquely 
determined by (i) and (ii) and each of them is a polynomial in T. 


Proof. We have just observed that we can write T = D+ WN 
where D is diagonalizable and N is nilpotent, and where D and N not only 
commute but are polynomials in T. Now suppose that we also have T = 
D' + N’ where D’ is diagonalizable, N’ is nilpotent, and D'N’ = N’D’. 
We shall provethat D = D’ and N = N’. 

Since D’ and N’ commute with one another and T = D’ + N’, we 
see that D’ and N’ commute with T. Thus D’ and N’ commute with any 
polynomial in 7; hence they commute with D and with N. Now we have 

D+N=D'+4+N' 
or 

D-D=N'-N 
and all four of these operators commute with one another. Since D and D’ 
are both diagonalizable and they commute, they are simultaneously 
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diagonalizable, and D — D" is diagonalizable. Since N and N’ are both 
nilpotent and they commute, the operator (N’ — N) is nilpotent; for, 
using the fact that N and N’ commute 


ww = wy = 5 (") ary(—myi 


and so when r is sufficiently large every term in this expression for 
(N’ — NY will be 0. (Actually, a nilpotent operator on an n-dimensional 
space must have its nth power 0; if we take r = 2n above, that will be 
large enough. It then follows that r = n is large enough, but this is not 
obvious from the above expression.) Now D — D’ is a diagonalizable 
operator which is also nilpotent. Such an operator is obviously the zero 
operator; for since it is nilpotent, the minimal polynomial for this operator 
is of the form zq” for somer << m; but then since the operator is diagonaliza- 
ble, the minimal polynomial cannot have a repeated root; hence r = 1 and 
the minimal polynomial is s.mply x, which says the operator is 0. Thus we 
see that D = D’andN = WN’, J 


Corollary. Let V be a finite-dimensional vector space over an algebra- 
ically closed field F, e.g., the field of complex numbers. Then every linear 
operator T on V can be written as the sum of a diagonalizable operator D 
and a nilpotent operator N which commute. These operators D and N are 
unique and each is a polynomial in T. 


From these results, one sees that the study of linear operators on 
vector spaces over an algebraically closed field is essentially reduced to 
the study of nilpotent operators. For vector spaces over non-algebraically 
closed fields, we still need to find some substitute for characteristic values 
and vectors. It is a very interesting fact that these two problems can be 
handled simultaneously anc this is what we shall do in the next chapter. 

In concluding this section, we should like to give an example which 
illustrates some of the ideas of the primary decomposition theorem. We 
have chosen to give it at the end of the section since it deals with differential 
equations and thus is not purely linear algebra. 


ExampuE 14. In the primary decomposition theorem, it is not neces- 
sary that the vector space V be finite dimensional, nor is it necessary for 
parts (i) and (ii) that p be the minimal polynomial for T. If T is a linear 
operator on an arbitrary vector space and if there is a monic polynomial 
p such that p(T) = 0, then parts (i) and (ii) of Theorem 12 are valid for T 
with the proof which we gave. 

Let n be a positive integer and let V be the space of all n times con- 
tinuously differentiable functions f on the real line which satisfy the 
differential equation 


223 


224 


Elementary Canonical Forms Chap. 6 


de! d 
(6-18) tai gt te ta ETTET, 
where ae, . . ., an-ı are some fixed constants. If Ca denotes the space of 


n times continuously differentiable functions, then the space V of solutions 
of this differential equation is a subspace of Cn. If D denotes the differentia- 
tion operator and p is the polynomial 


p = 2 + Anya + +++ + aye + a 


then V is the null space of the operator p(D), because (6-18) simply says 
p(D)f = 0. Therefore, V is invariant under D. Let us now regard D as a 
linear operator on the subspace V. Then p(D) = 0. 

If we are discussing differentiable complex-valued functions, then C, 
and V are complex vector spaces, and dp, ..., Gn-1 may be any complex 
numbers. We now write 


p = (x = Cy)" has (x =e. cx)" 


where c1, . . . , c, are distinct complex numbers. If W; is the null space of 
(D — c;I)", then Theorem 12 says that 


V=M@---OwW, 


In other words, if f satisfies the differential equation (6-18), then f is 
uniquely expressible in the form 


fefto th 


where f; satisfies the differential equation (D — c,J)"if; = 0. Thus, the 
study of the solutions to the equation (6-18) is reduced to the study of 
the space of solutions of a differential equation of the form 


(6-19) (D — cIYf = 0. 


This reduction has been accomplished by the general methods of linear 
algebra, i.e., by the primary decomposition theorem. 

To describe the space of solutions to (6-19), one must know something 
about differential equations, that is, one must know something about D 
other than the fact that it is a linear operator. However, one does not need 
to know very much. It is very easy to establish by induction on r that if f 
is in C, then 

(D — cIyf = eD- (ef) 
that is, 


y — df(t) = ed (ef), ete. 


Thus (D — cl)'f = 0 if and only if D*(e-«f) = 0. A function g such that 
D'"g = 0, i.e., d'g/dť = 0, must be a polynomial function of degree (r — 1) 
or less: 


g(t) = b + bt + + bath 
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Thus f satisfies (6-19) if and only if f has the form 
F(t) = eo + bt + +++ + brit). 


Accordingly, the ‘functions’ 2°, te, ... , t7~1e°‘ span the space of solutions 


of (6-19). Since 1, ¢t,..., ¢*~! are linearly independent functions and the. 


exponential function has no zeros, these r functions tet, 0 <j <r—1, 
form a basis for the space of solutions. 
Returning to the differential equation (6-18), which is 
— p(B)f = 0 
p= (x = a)" xan (x = Cx)" 

we see that the n functions tet, O0 <m <r;— 1, 1<j <k, forma 
basis for the space of solutions to (6-18). In particular, the space of solutions 
is finite-dimensional and has dimension equal to the degree of the poly- 
nomial p. 


Exercises 


l. Let T be a linear operator on R? which is represented in the standard ordered 


basis by the matrix 
6 -3 -2 
[i E -| 
10 —5 —3 


Express the minimal polynomial p for T in the form p = pip, where pı and p: 
are monic and irreducible over the field of real numbers. Let W; be the null space 
of p,(T). Find bases ®; for the spaces W, and W». If T; is the operator induced on 
W; by T, find the matrix of T; in the basis ®; (above). 


2. Let T be the linear operater on R? which is represented by the matrix 


38 1 -1 
22 —~1 
2 2 0 


in the standard ordered basis. Show that there is a diagonalizable operator D 
on R? and a nilpotent operator N on Rè such that T = D +N and DN = ND. 
Find the matrices of D and N in the standard basis. (Just repeat the proof of 
Theorem 12 for this special case.) 


3. If V is the space of all polynomials of degree less than or equal to n over a 
field F, prove that the differentiation operator on V is nilpotent. 


4, Let T bea linear operator on the finite-dimensional space V with characteristic 
polynomial 
Í = (a < e)% wee (x — Cx) 
and minimal polynomial 
p = (x _ c)” eae (x — Cy)”*. 
Let W: be the null space of (T — cI)". 
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(a) Prove that W; is the set of all vectors a in V such that (T — cI)"a = 0 
for some positive integer m (which may depend upon a). 

(b) Prove that the dimension of W; is di. (Hint: If T; is the operator induced 
on W; by T, then T; — cil is nilpotent; thus the characteristic polynomial for 
T; — cd must be ze: where e; is the dimension of W; (proof?); thus the charac- 
teristic polynomial of T; is (x — c;)¢; now use the fact that the characteristic 
polynomial for T is the product of the characteristic polynomials of the T; to show 
that e; = dj.) 

5. Let V be a finite-dimensional vector space over the field of complex numbers. 
Let T be a linear operator on V and let D be the diagonalizable part of T. Prove 
that if g is any polynomial with complex coefficients, then the diagonalizable part 
of g(T) is g(D). 

6. Let V be a finite-dimensional vector space over the field F, and let T be a 
linear operator on V such that rank (T) = 1. Prove that either T is diagonalizable 
or T is nilpotent, not both. 


7. Let V bea finite-dimensional vector space over F, and let T be a linear operator 
on V. Suppose that 7 commutes with every diagonalizable linear operator on V. 
Prove that T is a scalar multiple of the identity operator. 


8. Let V be the space of n X n matrices over a field F, and let A be a fixed n X n 
matrix over F. Define a linear operator T on V by T(B) = AB — BA. Prove 
that if A is a nilpotent matrix, then T is a nilpotent operator. 


9. Give an example of two 4 X 4 nilpotent matrices which have the same minimal 
polynomial (they necessarily have the same characteristic polynomial) but which 
are not similar. 


10. Let T be a linear operator on the finite-dimensional space V, let p = pi <+: DE 
be the minimal polynomial for T, and let V = Wi@®--- @ W, be the primary 
decomposition for T, i.e., W; is the null space of p,(T)"i. Let W be any subspace 
of V which is invariant under T. Prove that 


W= (WW) OW W2) @--- OW Wi). 


11. What’s wrong with the following proof of Theorem 13? Suppose that the 
minimal polynomial for T is a product of linear factors. Then, by Theorem 5, 
T is triangulable. Let @ be an ordered basis such that A = [T]e is upper-triangular. 
Let D be the diagonal matrix with diagonal entries ay,..., @nn, Then A = D +N, 
where N is strictly upper-triangular. Evidently N is nilpotent. 


12. If you thought about Exercise 11, think about it again, after you observe 
what Theorem 7 tells you about the diagonalizable and nilpotent parts of T. 


13. Let T be a linear operator on V with minimal polynomial of the form p”, 
where p is irreducible over the scalar field. Show that there is a vector œ in V 
such that the T-annihilator of @ is p”. 


14. Use the primary decomposition theorem and the result of Exercise 13 to prove 
the following. If T is any linear operator on a finite-dimensional vector space V, 
then there is a vector a in V with T-annihilator equal to the minimal polynomial 
for T. 


15. If N is a nilpotent linear operator on an n-dimensional vector space V, then 
the characteristic polynomial for N is x". 


7. The Rational 


and Jordan Forms 


7.1. Cyclic Subspaces and Annihilators 


Once again V is a finite-dimensional vector space over the field F 
and T is a fixed (but arbitrary) linear operator on V. If «œ is any vector 
in V, there is a smallest subspace of V which is invariant under T and 
contains a. This subspace can be defined as the intersection of all T- 
invariant subspaces which contain a; however, it is more profitable at the 
moment for us to look at things this way. If W is any subspace of V which 
is invariant under T and contains a, then W must also contain the vector 
Ta; hence W must contain T(Ta) = T?’a, T(T?a) = Ta, ete. In other 
words W must contain g(T)x for every polynomial g over F. The set of all 
vectors of the form g(T)a, with g in F[z], is clearly invariant under T, and 
is thus the smallest T-invariant subspace which contains a. 


Definition. If aœ is any vector in V, the T-cyclic subspace generated 
by a is the subspace Z(a;T) of all vectors of the form g(T)a, g in F [x]. 
If Za; T) = V, then a is called a cyclic vector for T. 


Another way of describing the subspace Z(a; T) is that Z(a;T) is 
the subspace spanned by the vectors T*a, k > 0, and thus a is a cyclic 
vector for T if and only if these vectors span V. We caution the reader 
that the general operator T has no cyclic vectors. 


EXAMPLE 1. For any T, the T-cyclic subspace generated by the zero 
vector is the zero subspace. The space Z(a; T) is one-dimensional if and 
only if æ is a characteristic vector for T. For the identity operator, every 
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non-zero vector generates a one-dimensional cyclic subspace; thus, if 
dim V > 1, the identity operator has no cyclic vector. An example of an 
operator which has a cyclic vector is the linear operator T on F? which is 
represented in the standard ordered basis by the matrix 


li of 

1 0 

Here the cyclic vector (a cyclic vector) is «; for, if 6 = (a, b), then with 
g =a+ be we have 8 = g(T)a. For this same operator T, the cyclic 
subspace generated by e is the one-dimensional space spanned by e2, 


because e is a characteristic vector of T. 
For any T and a, we shall be interested in linear relations 


coa +aTa+ oat +cT*a = 0 


between the vectors T'a, that is, we shall be interested in the polynomials 
g = ot axt -+ azt which have the property that g(T)a = 0. The 
set of all g in F[z] such that g(T)a = 0 is clearly an ideal in F[z]. It is also 
a non-zero ideal, because it contains the minimal polynomial p of the 
operator T (p(T) = 0 for every a in V). 


Definition. If a is any vector in V, the T-annihilator of a 1s the ideal 
M(a; T) in F[x] consisting of all polynomials g over F such that g(T)a = 0. 
The unique monic polynomial pa which generates this ideal will also be 
called the T-annihilator of a. 


As we pointed out above, the T-annihilator pa divides the minimal 
polynomial of the operator T. The reader should also note that deg (pa) > 0 
unless a is the zero vector. 


Theorem 1. Let a be any non-zero vector in V and let pa be the 
T-annihilator of a. 


(i) The degree of Pa is equal to the dimension of the cyclic subspace 
Z(a; T). 

(ii) If the degree of Pa is k, then the vectors a, Ta, T?a,..., T-a 
form a basis for Z(a; T). 

(iii) If U ts the linear operator on Z(a; T) induced by T, then the minimal 
polynomial for U is pa. 

Proof. Let g be any polynomial over the field F. Write 
g = pa tr 
where either r = 0 or deg (r) < deg (pa) = k. The polynomial pag is in 
the T-annihilator of a, and so 
g(T)a = r(T)a. 


Since r = 0 or deg (r) < k, the vector r(T)a is a linear combination of 
the vectors a, Ta,..., T™!a, and since g(T)aæ is a typical vector in 
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Z(a;T), this shows that these k vectors span Z(a; T). These vectors are 
certainly linearly independent, because any non-trivial linear relation 
between them would give us a non-zero polynomial g such that g(T)a = 0 
and deg (g) < deg (pa), which is absurd. This proves (i) and (ii). 

Let U be the linear operator on Z(a; T) obtained by restricting T to 
that subspace. If g is any polynomial over F, then 


pal U)g(T)a = palT)g(T a 
= g(T)pa(T)a 
g(T)0 

= 0. 
Thus the operator p,(U) sends every vector in Z(a; T) into 0 and is the 
zero operator on Z(a;T). Furthermore, if h is a polynomial of degree 
less than k, we cannot have h(U) = 0, for then h(U)a = h(T)a = 0, 
contradicting the definition of Pe This shows that Pa is the minimal 
polynomial for U. J 


A particular consequence of this theorem is the following: If a happens 
to be a cyclic vector for T, then the minimal polynomial for T must have 
degree equal to the dimension of the space V; hence, the Cayley-Hamilton 
theorem tells us that the minimal polynomial for T is the characteristic 
polynomial for T. We shall prove later that for any T there is a vector a in 
V which has the minimal polynomial of T for its annihilator. It will then 
follow that T has a cyclic vector if and only if the minimal and charac- 
teristic polynomials for T are identical. But it will take a little work for us 
to see this. 

Our plan is to study the general T by using operators which have a 
cyclic vector. So, let us take a look at a linear operator U on a space W 
of dimension k which has a cyclic vector œ. By Theorem 1, the vectors 


a,..., Uta form a basis for the space W, and the annihilator Pa of a 
is the minimal polynomial for U (and hence also the characteristic poly- 
nomial for U). If we let a; = U*a, i = 1,...,k, then the action of U 
on the ordered basis ® = {ai,..., ax} is 

Ua; = ais, t7=1,...,k-1 
7-1 > ? ? 
( ) Car = —C€9Q1 — C12 — "°° — CK-10k 
where Pa = ca + Cx + +++ + cerat + xt. The expression for Uar 


follows from the fact that p,(U)a = 0, i.e., 
Uta + aU a + +++ + aUa + co = 0. 
This says that the matrix of U in the ordered basis @ is 


000 -> 0 =o 
100- 0 =a 
(7-2) 016 .. 0 —o 


00 0 1 san 
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The matrix (7-2) is called the companion matrix of the monic poly- 
nomial Pe. 


Theorem 2. If U is a linear operator on the finite-dimensional space 
W, then U has a cyclic vector if and only if there ts some ordered basis for W 
in which U is represented by the companion matrix of the minimal polynomial 


for U. 


Proof. We have just observed that if U has a cyclic vector, then 
there is such an ordered basis for W. Conversely, if we have some ordered 
basis {a1,..., ax} for W in which U is represented by the companion 


matrix of its minimal polynomial, it is obvious that a is a cyclic vector 
for U. I 


Corollary. If A is the companion matrix of a monic polynomial p, 
then p is both the minimal and the characteristic polynomial of A. 


Proof. One way to see this is to let U be the linear operator on 
F* which is represented by A in the standard ordered basis, and to apply 
Theorem 1] together with the Cayley-Hamilton theorem. Another method 
is to use Theorem 1 to see that p is the minimal polynomial for A and to 


verify by a direct calculation that p is the characteristic polynomial for 
A. J 


One last comment-~~if T is any linear operator on the space V and 
a is any vector in V, then the operator U which T induces on the cyclic 
subspace Z(a; T) has a cyclic vector, namely, a. Thus Z(a;T) has an 
ordered basis in which U is represented by the companion matrix of Pa 
the T-annihilator of a. 


Exercises 


1, Let T be a linear operator on F?. Prove that any non-zero vector which is not 
a characteristic vector for T is a cyclic vector for T. Hence, prove that either T 
has a cyclic vector or T is a scalar multiple of the identity operator. 


2. Let T be the linear operator on R? which is represented in the standard ordered 


basis by the matrix 
20 0 
f 2 o| 
0 0 -1 


Prove that T has no cyclic vector. What is the T-cyclic subspace generated by the 
vector (1, —1, 3)? 


3. Let T be the linear operator on C which is represented in the standard ordered 


basis by the matrix 
1 tz 0 
-l1 2 -i}- 
0 1i 1 
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Find the T-annihilator of the vector (1, 0, 0). Find the T-annihilator of (1, 0, 7). 


4. Prove that if T? has a cyclic vector, then T has a cyclic vector. Is the converse 
true? 


5. Let V be an n-dimensional vector space over the field F, and let N be a nilpotent 
linear operator on V. Suppose N”! = 0, and let @ be any vector in V such that 
N”—ia 34 0. Prove that æ is a cyclic vector for N. What exactly is the matrix of N 
in the ordered basis {a, Na, . . . , N”™a}? 

6. Give a direct proof that if A is the companion matrix of the monic polynomial 
p, then p is the characteristic polynomial for A. 


7. Let V be an n-dimensional vector space, and let T be a linear operator on V. 
Suppose that T is diagonalizable, 

(a) If T has a cyclic vector, show that T has n distinct characteristic values. 

(b) If T has n distinct characteristic values, and if {ay,..., Qn} is a basis of 

characteristic vectors for T, show that a = a, + --- + a, is a cyclic vector for T. 


8. Let T be a linear operator on the finite-dimensional vector space V. Suppose T 
has a cyclic vector. Prove that if U is any linear operator which commutes with T, 
then U is a polynomial in T. 


7.2. Cyclic Decompositions and 
the Rational Form 


The primary purpose of this section is to prove that if T is any linear 
operator on a finite-dimensional space V, then there exist vectors ay . . . , @ 
in V such that 

V = Z(m3T)®--- OZ(a;; T). 


In other words, we wish to prove that V is a direct sum of T-cyclic sub- 
spaces. This will show that T is the direct sum of a finite number of linear 
operators, each of which has a cyclic vector. The effect of this will be to 
reduce many questions about the general linear operator to similar ques- 
tions about an operator which has a cyclic vector. The theorem which we 
prove (Theorem 3) is one of the deepest results in linear algebra and has 
many interesting corollaries. 

The cyclic decomposition theorem is closely related to the following 
question. Which T-invariant subspaces W have the property that there 
exists a T-invariant subspace W’ such that V = W@W’ If W is any 
subspace of a finite-dimensional space V, then there exists a subspace W’ 
such that V = W@W’. Usually there are many such subspaces W’ and 
each of these is called complementary to W. We are asking when a T- 
invariant subspace has a complementary subspace which is also invariant 
under T. 

Let us suppose that V = W @ W’ where both W and W’ are invariant 
under T and then see what we can discover about the subspace W. Each 
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vector 8 in V is of the form 8 = y + y’ where y is in W and 7’ is in W’. 
If f is any polynomial over the scalar field, then 


A(T)B = f(T)y +f. 
Since W and W’ are invariant under T, the vector f(T)y isin W and f(T)y’ 


is in W’. Therefore f(T) is in W if and only if f(T)y’ = 0. What interests 
us is the seemingly innocent fact that, if f(T)6 isin W, then f(T)8 = f(T)y. 


Definition. Let T be a linear operator on a vector space V and let W 
be a subspace of V. We say that W is T-admissible if 


(i) W is invariant under T; 
(ii) if £(T)B is in W, there exists a vector y in W such that {(T)6 = f(T)y. 


As we just showed, if W is invariant and has a complementary in- 
variant subspace, then W is admissible. One of the consequences of Theo- 
rem 3 will be the converse, so that admissibility characterizes those 
invariant subspaces which have complementary invariant subspaces. 

Let us indicate how the admissibility property is involved in the 
attempt to obtain a decomposition 


V =Z(m;T)@--- O Za; T). 


Our basic method for arriving at such a decomposition will be to inductively 
select the vectors a;,..., a,. Suppose that by some process or another we 
have selected a;,..., a; and the subspace 


W; = Za; T) + +++ + Z(a;; T) 
is proper. We would like to find a non-zero vector a;41 such that 
W; N Zlaim; T) = {0} 


because the subspace W; = W; @ Z (ajy; T) would then come at least 
one dimension nearer to exhausting V. But, why should any such aj41 
exist? If a1, . . . , a; have been chosen so that W; is a T-admissible subspace, 
then it is rather easy to see that we can find a suitable æœ;j+ı. This is what 
will make our proof of Theorem 3 work, even if that is not how we phrase 
the argument. 

Let W be a proper T-invariant subspace. Let us try to find a non-zero 
vector a such that 


(7-3) W A Z(a;T) = {0}. 


We can choose some vector 8 which is not in W. Consider the T-conductor 
S(8; W), which consists of all polynomials g such that g(T)8 is in W. Recall 
that the monic polynomial f = s(8; W) which generates the ideal S(8; W) 
is also called the T-conductor of 8 into W. The vector f(T)8 is in W. Now, if 
W is T-admissible, there is a y in W with f(T)@ = f(T)y. Leta = 8 — y 
and let g be any polynomial. Since 8 — a is in W, g(T)@ will be in W if and 
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only if g(T)e is in W; in other words, S(a;W) = S(8; W). Thus the 
polynomial f is also the T-conductor of a into W. But f(T)a = 0. That 
tells us that g(T)a is in W if and only if g(T)a = 0, i.e., the subspaces 
Z(a; T) and W are independent (7-3) and f is the T-annihilator of a. 


Theorem 3 (Cyclic Decomposition Theorem). Let T be a linear 
operator on a finite-dimensional vector space V and let Wo be a proper T- 


admissible subspace of V. There exist non-zero vectors a,..., a, in V with 
respective T-annihilators pi, ..., pr such that 

(i) V = Wo® Za; T) O --- O Zla; T); 

(il) py divides pra, k = 2,...,7. 
Furthermore, the integer r and the annthilators pı, ..., pr are uniquely 


determined by (i), Gi), and the fact that no ay is 0. 

Proof. The proof is rather long; hence, we shall divide it into four 
steps. For the first reading it may seem easier to take Wo = {0}, although 
it does not produce any substantial simplification. Throughout the proof, 
we shall abbreviate f(T)8 to f6. 


Step 1. There exist non-zero vectors B,,..., B: in V such that 
(a) V = Wo + Z; T) + --- + Z(8;;T); 
(b) if1<k < rand 
Wk = Wo + Zíb; T) +- + Z(Bx; T) 
then the conductor px = s(bBx; Wx-1) has maximum degree among all T- 
conductors into the subspace Wx-1, 1.€., for every k 


deg px = maz deg sla; Wx). 
ain V 


This step depends only upon the fact that Wo is an invariant subspace. 
If W is a proper T-invariant subspace, then 


0 < max deg sla; W) < dim V 


and we can choose a vector 6 so that deg s(@; W) attains that maximum. 
The subspace W + Z(6; T) is then T-invariant and has dimension larger 
than dim W. Apply this process to W = W, to obtain bı. If Wi = Wo + 
Z(61; T) is still proper, then apply the process to W; to obtain 82. Continue 
in that manner. Since dim W, > dim Wr, we must reach W, = V in not 
more than dim V steps. 


Step 2. Let B,,..., Br be non-zero vectors which satisfy conditions 
(a) and (b) of Step 1. Fiz kj 1 < k < r. Let 8 be any vector in V and let 
f = s(8; Wk). If 


f8 =bo+ È gbp Biin Wi 
1<i<k 


then f divides each polynomial g; and Bo = fyo, where yo is in Wo. 
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If k = 1, this is just the statement that Wo is T-admissible. In order 
to prove the assertion for k > 1, apply the division algorithm: 
(7-4) gi = fhi + ti riı=0 or degr: < degf. 
We wish to show that r; = 0 for each 7. Let 


k-1 
(7-5) Y= B— È hip: 
Since y — @ is in W; 


8(y; Wr) = 8(8; Wir) =f. 
Furthermore 


k-1 
(7-6) fy = Bo+ 2 Tibi. 


Suppose that some r; is different from 0. We shall deduce a contradiction. 
Let j be the largest index 7 for which r; + 0. Then 


j 
(7-7) fy = Bo+ Z ribi, r; #0 and degr; < deg f. 


Letp = s(y; W;j-1). Since Wz contains W ;—ı, the conductor f = s(y; Wr) 
must divide p: 
p = fg. 


Apply g(T) to both sides of (7-7): 
(7-8) py = gfy = gri + gbo + E grib: 
l<i<j 


By definition, py isin W;—ı, and the last two terms on the right side of (7-8) 
are in W,_;. Therefore, grj8; isin W;_1. Now we use condition (b) of Step 1: 


deg (gr;) = deg s(8;; Wj-1) 
= deg p; 
> deg s(v; W3-1) 
= deg p 
= deg (fg). 


Thus deg r; > deg f, and that contradicts the choice of j. We now know 
that f divides each g; and hence that Bo = fy. Since Wo is T-admissible, 
Bo = fYo where yo is in Wp. We remark in passing that Step 2 is a strength- 
ened form of the assertion that each of the subspaces Wi, W2,..., W, is 
T-admissible. 


Step 8. There exist non-zero vectors a1,..., @ in V which 
satisfy conditions (i) and (ii) of Theorem 8. 
Start with vectors (,,..., 8, as in Step 1. Fix k, 1 < k < r. We apply 
Step 2 to the vector 6 = 6, and the T-conductor f = px. We obtain 


(7-9) Pibe = pryo t È prh: 
1<i<k 
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where ‘¥o is in Wo and fy, ... , Ax_1 are polynomials. Let 
(7-10) ak = Be-—Yo- È hifi. 
1<i<k 


Since Bx — Ak is in Wry 


(7-11) slax; Wi) = 8(br; Wea) = pe 
and since pra, = 0, we have 
(7-12) Wr A Z(ar; T) = {0}. 


Because each a, satisfies (7-11) and (7-12), it follows that 


Wi = Wo OZ(m;T)O--: ® Z(cx; T) 

and that p+ is the T-annihilator of œx. In other words, the vectors a, . . . , a; 
define the same sequence of subspaces Wi, We,... as do the vectors 
61,...,8, and the T-conductors px = s(x, Wk) have the same max- 
imality properties (condition (b) of Step 1). The vectors a,..., a, have 
the additional property that the subspaces Wo, Zla; T), Z(a2;T),... are 
independent. It is therefore easy to verify condition (ii) in Theorem 38. 
Since pia; = 0 for each 7, we have the trivial relation 


prar = O + pray + +++ F Priari. 


Apply Step 2 with bı, .. ., Bx replaced by a,..., a, and with 8 = ax. 
Conclusion: p, divides each p; with i < k. 

Step 4. The number r and the polynomials pı, . . . , Pr are uniquely 
determined by the conditions of Theorem 3. 

Suppose that in addition to the vectors a, ..., æ, in Theorem 3 we 
have non-zero vectors yı, . . . , Ys with respective T-annihilators 1, - - . , Js 
such that 
(7-13) V = WOZ; T) @ ++ Zr; T) 

gx divides gx-1, k= 2,...,8. 


We shall show that r = s and p; = g; for each 2. 

It is very easy to see that pı = gı. The polynomial gı is determined 
from (7-13) as the T-conductor of V into Wo. Let S(V; Wo) be the collection 
of polynomials f such that f8 is in Wae for every B in V, i.e., polynomials f 
such that the range of f(T) is contained in Wo. Then S(V; Wo) is a non-zero 
ideal in the polynomial algebra. The polynomial gı is the monic generator 
of that ideal, for this reason. Each 8 in V has the form 


B = bo + fin + papi + fils 


and so 
gB = gbo + E gfi 


Since each g; divides gı, we have gry; = 0 for all 7 and g8 = gıbo is in Wo. 
Thus gı is in S(V; Wo). Since gı is the monic polynomial of least degree 
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which sends yı into Wo, we see that gı is the monic polynomial of least degree 
in the ideal S(V; Wy). By the same argument, pı is the generator of that 
ideal, so pı = fı. 

If f is a polynomial and W is a subspace of V, we shall employ the 
shorthand fW for the set of all vectors fa with a in W. We have left to the 
exercises the proofs of the following three facts. 

1. fZ(a;T) = Z(fa; T). 

2. If V = Vi@--- @® V, where each V; is invariant under T, then 
SV =f/"W@®--- OFV: 

3. If a and y have the same T-annihilator, then fa and fy have the 
same 7’-annihilator and (therefore) 


dim Z(fa; T) = dim Z(fy; T). 
Now, we proceed by induction to show that r = $ and p; = g: for 
i = 2,...,7. The argument consists of counting dimensions in the right 
way. We shall give the proof that if r > 2 then p: = gz, and from that the 
induction should be clear. Suppose that r > 2. Then 


dim W, + dim Z(a1; T) < dim V. 
Since we know that pı = gı, we know that Z(ai; T) and Z(y; T) have the 
same dimension. Therefore, 

dim W, + dim Z(y1; T) < dim V 
which shows that s > 2. Now it makes sense to ask whether or not p: = 92: 
From the two decompositions of V, we obtain two decompositions of the 
subspace p2V: 
paV = pW. ® Z(pr; T) 
poV = pW O Z(pon; T) O «+: OZ (pers; T). 
We have made use of facts (1) and (2) above and we have used the fact 
that pa; = 0, i > 2. Since we know that pı = gi, fact (3) above tells us 
that Z(poa1; T) and Z(peyi; T) have the same dimension. Hence, it is 
apparent from (7-14) that 

dim Z(pey:; T) = 0, 1 > 2. 


We conclude that my2 = 0 and g: divides pz. The argument can be reversed 
to show that pe divides gz. Therefore p: = 92- 


(7-14) 


Corollary. If T is a linear operator on a finite-dimensional vector 
space, then every T-admissible subspace has a complementary subspace which 
is also invariant under T. 


Proof. Let Wo be an admissible subspace of V. If Wo = V, the 
complement we seek is {0}. If Wo is proper, apply Theorem 3 and let 


We = Zla; T) @® -- @ Zaz; T). 
Then W% is invariant under T and V = Wo@We. J 
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Corollary. Let T be a linear operator on a fintte-dimensional vector 
space V. 


(a) There exists a vector a in V such that the T-annthilator of a is the 
minimal polynomial for T. 

(b) T has a cyclic vector if and only if the characteristic and minimal 
polynomials for T are identical. 

Proof. If V = {0}, the results are trivially true. If V = {0}, let 
(7-15) V = Zla; T)D--: @ Zar; T) 
where the 7-annihilators pı, . . . , p, are such that pr+ı divides pz, 1 < k < 
r — 1. As we noted in the proof of Theorem 3, it follows easily that pı is the 
minimal polynomial for T, i.e., the T-conductor of V into {0}. We have 
proved (a). 

We saw in Section 7.1 that, if T has a cyclic vector, the minimal 
polynomial for T coincides with the characteristic polynomial. The content 
of (b) isin the converse. Choose any æ as in (a). If the degree of the minimal 
polynomial is dim V, then V = Z(a;T). J 


Theorem 4 (Generalized Cayley-Hamilton Theorem), Let T be 
a linear operator on a finite-dimensional vector space V. Let p and f be the 
minimal and characteristic polynomials for T, respectively. 
(i) p divides f. 
(ii) p and f have the same prime factors, except for multiplicities. 


(iii) If 


(7-16) p=fi'--- fit 
is the prime factorization of p, then 
(7-17) f= ft... fh 


where d; is the nullity of £;(T)*: divided by the degree of fi. 


Proof. We disregard the trivial case V = {0}. To prove (i) and 
(ii), consider a cyclic decomposition (7-15) of V obtained from Theorem 3. 
As we noted in the proof of the second corollary, pı = p. Let U; be the 
restriction of T to Z(a:;; T). Then U; has a cyclic vector and so p; is both 
the minimal polynomial and the characteristic polynomial for U;. There- 
fore, the characteristic polynomial f is the product f = pı -++ p,. That is 
evident from the block form (6-14) which the matrix of T assumes in a 
suitable basis. Clearly pı = p divides f, and this proves (i). Obviously any 
prime divisor of p is a prime divisor of f. Conversely, a prime divisor of 
f = pı -++ p- must divide one of the factors p,, which in turn divides 7. 
Let (7-16) be the prime factorization of p. We employ the primary 
decomposition theorem (Theorem 12 of Chapter 6). It tells us that, if V; 
is the null space of f(T)", then 
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(7-18) V=n0@::-OV, 

and f;' is the minimal polynomial of the operator 7';, obtained by restricting 
T to the (invariant) subspace V;. Apply part (ii) of the present theorem to 
the operator T;. Since its minimal polynomial is a power of the prime f;, 
the characteristic polynomial for T; has the form fë, where d; > ri Obvi- 
ously 





__ dim V; 
deg fi 
and (almost by definition) dim V; = nullity f(T)". Since T is the direct 
sum of the operators Tı, ..., T, the characteristic polynomial f is the 
product 
f=ft- fh | 


Corollary. If T is a nilpotent linear operator on a vector space of 
dimension n, then the characteristic polynomial for T is x”. 


Now let us look at the matrix analogue of the cyclic decomposition 
theorem. If we have the operator T and the direct-sum decomposition of 
Theorem 3, let @; be the ‘cyclic ordered basis’ 


{ai, Tas, +, Thi-lq,} 


for Z (ai; T). Here k: denotes the dimension of Z(a;; T), that is, the degree 
of the annihilator p,;. The matrix of the induced operator T; in the ordered 
basis ®; is the companion matrix of the polynomial p;. Thus, if we let @ be 
the ordered basis for V which is the union of the &; arranged in the order 


@i,..., Gr then the matrix of T in the ordered basis @ will be 
Ai O oe. 0 

(7-19) ga ee Be) 
O06) tae" A 


where A; is the k: X k: companion matrix of p; An n X n matrix A, 
which is the direct sum (7-19) of companion matrices of non-scalar monic 
polynomials pı, ..., p, such that p:+ı divides p; for i = 1,...,7r — 1, 
will be said to be in rational form. The cyclic decomposition theorem 
tells us the following concerning matrices. 


Theorem 5. Let F be a field and let B be an n X n matrix over F. 
Then B is similar over the field F to one and only one matrix which is in 
rational form. 


Proof. Let T be the linear operator on F” which is represented by 
B in the standard ordered basis. As we have just observed, there is some 
ordered basis for F” in which T is represented by a matrix A in rational 
form. Then B is similar to this matrix A. Suppose B is similar over F to 
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another matrix C which is in rational form. This means simply that there 
is some ordered basis for F” in which the operator T is represented by the 
matrix C. If C is the direct sum of companion matrices C; of monic poly- 


nomials g}, . .., gs such that g;4; divides g; for i = 1,...,s — 1, then it 
is apparent that we shall have non-zero vectors ĝi, . .., 6, in V with T- 
annihilators gı, . . . , gs such that 


V=268;T)®--- ® ZE; T). 
But then by the uniqueness statement in the cyclic decomposition theorem, 


the polynomials g; are identical with the polynomials p; which define the 
matrix A. Thus C = A. J 


The polynomials p, ..., p, are called the invariant factors for 
the matrix B. In Section 7.4, we shall describe an algorithm for calculating 
the invariant factors of a given matrix B. The fact that it is possible to 
compute these polynomials by means of a finite number of rational opera- 
tions on the entries of B is what gives the rational form its name. 


EXAMPLE 2. Suppose that V is a two-dimensional vector space over 
the field F and T is a linear operator on V. The possibilities for the cyclic 
subspace decomposition for T are very limited. For, if the minimal poly- 
nomial for T has degree 2, it is equal to the characteristic polynomial for 
T and T has a cyclic vector. Thus there is some ordered basis for V in 
which T is represented by the companion matrix of its characteristic 
polynomial. If, on the other hand, the minimal polynomial for T has degree 
1, then T is a scalar multiple of the identity operator. If T = cI, then for 
any two linear independent vectors a; and az in V we have 


V = Z(a;T) ® Z(a; T) 
MN = pe=xr—-C. 


For matrices, this analysis says that every 2 X 2 matrix over the field F 
is similar over F to exactly one matrix of the types 


eae 


EXAMPLE 3. Let T be the linear operator on R? which is represented 
by the matrix 


5 —6 —6 
A=|-—-1 4 2 
3 —6 —4 


in the standard ordered basis. We have computed earlier that the char- 
acteristic polynomial for T is f = (x — 1)(x — 2)? and the minimal 
polynomial for T is p = (x — 1)(x — 2). Thus we know that in the cyclic 
decomposition for T the first vector a will have p as its T-annihilator. 
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Since we are operating in a three-dimensional space, there can be only one 
further vector, œz. It must generate a cyclic subspace of dimension 1, i.e., 
it must be a characteristic vector for T. Its T-annihilator p: must be 
(x — 2), because we must have pp: = f. Notice that this tells us im- 
mediately that the matrix A is similar to the matrix 


0 -2 0 
B=]ļ|1 3 0 
0 0 2 


that is, that T is represented by B in some ordered basis. How can we find 
suitable vectors a; and a2? Well, we know that any vector which generates 
a T-cyclic subspace of dimension 2 is a suitable a. So let’s just try «. We 
have 

Te = (5, —1, 3) 
which is not a scalar multiple of «; hence Z(e,; T) has dimension 2. This 
space consists of all vectors ae, + b(Te): 


a(1, 0, 0) + (5, —1, 3) = (a + 5b, —b, 3b) 


or, all vectors (xı, £} 23) satisfying x3 = —3z. Now what we want is 
a vector œ such that Ta: = 2a. and Z(a2;T) is disjoint from Z(a; T). 
Since a: is to be a characteristic vector for T, the space Z (œz; T) will simply 
be the one-dimensional space spanned by az, and so what we require is that 
a. not be in Z(e; T). If a = (a, z% x), one can easily compute that 
Ta = 2a if and only if xı = 22. + 223. Thus a2 = (2, 1, 0) satisfies Tay = 
2a2 and generates a T-cyclic subspace disjoint from Z(e,; T). The reader 
should verify directly that the matrix of T in the ordered basis 


{(1, 0, 0), (5, =i, 3), (2, 1, 0)} 


is the matrix B above. 


EXAMPLE 4. Suppose that T is a diagonalizable linear operator on V. 
It is interesting to relate a cyclic decomposition for T to a basis which 
diagonalizes the matrix of T. Let c,...,cx be the distinct characteristic 
values of T and let V; be the space of characteristic vectors associated with 
the characteristic value c;. Then 


V=Vi0-°:-Ov, 
and if d; = dim V; then 


f — (x _— o )h wee (x _— Cr) 
is the characteristic polynomial for T. If «œ is a vector in V, it is easy to 
relate the cyclic subspace Z(a; T) to the subspaces Vi,..., Vi. There are 
unique vectors ĝı, . . . , 8x such that @; is in V; and 


a= B+: + By. 
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Since T8; = c,B;, we have 

(7-20) S(T)a = flar)Bi + +++ + f (cx) Be 

for every polynomial f. Given any scalars t,...,& there exists a poly- 
nomial f such that f(c) = t, 1 <i < k. Therefore, Z(a;T) is just the 
subspace spanned by the vectors 6, ..., Br- What is the annihilator of a? 
According to (7-20), we have f(T)a = Oif and only if f(c:)@: = O for each i. 
In other words, f(T)a = 0 provided f(c;) = 0 for each 7 such that 6; = 0. 
Accordingly, the annihilator of æ is the product 


(7-21) I (æ — ci). 
B:£0 
Now, let G: = {81,..., Ba} be an ordered basis for V;. Let 
r = max dj. 
We define vectors au, . . . , ar by 
(7-22) a= E Bi, 1<j<r. 
diži 


The cyclic subspace Z(a;; T) is the subspace spanned by the vectors 6$, as 
i runs over those indices for which d; > j. The T-annihilator of a; is 


(7-23) pi = II (&@— &). 
di2j 
We have 
V = Z(u;T)®--- OZ(a,;T) 
because each 6} belongs to one and only one of the subspaces Z(a1; T'),.. ., 
Z(a,; T) and G = (Gi... , Bx) is a basis for V. By (7-23), pj41 divides 7p;. 


Exercises 


l. Let T be the linear operator on F? which is represented in the standard ordered 
basis by the matrix 
Li o] 
1 0 


Let a; = (0, 1). Show that F? = Z(a,; T) and that there is no non-zero vector œs 
in F? with Z(a2; T) disjoint from Z(a; T). 
2. Let T be a linear operator on the finite-dimensional space V, and let R be 

the range of T. 

(a) Prove that R has a complementary T-invariant subspace if and only if R 
is independent of the null space N of T. 

(b) If R and N are independent, prove that N is the unique T-invariant sub- 
space complementary to R. 


3. Let T be the linear operator on R? which is represented in the standard ordered 
basis by the matrix 
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Let W be the null space of T — 27. Prove that W has no complementary T-invariant 
subspace. (Hint: Let 8 = « and observe that (T — 2/)@ is in W. Prove there is 
no q in W with (T — 2/6 = (T — 2D)a.) 


4, Let T be the linear operator on F+ which is represented in the standard ordered 
basis by the matrix 


c 000 
1 c 0 Of, 
0 1c 0 
00 l œc 


Let W be the null space of T — cl. 

(a) Prove that W is the subspace spanned by €4. 

(b) Find the monic generators of the ideals S(e,;W), S(e3; W), S(e; W), 
S(e} W). 


5. Let T be a linear operator on the vector space V over the field F. If f is a poly- 
nomial over F and a is in V, let fa = f(T)a. If Vi,..., Vz are T-invariant sub- 
spaces and V = V;@--- @ Vi, show that 


SV =f e OfVe 
6. Let T, V, and F be as in Exercise 5. Suppose @ and ĝ are vectors in V which 
have the same T-annihilator. Prove that, for any polynomial f, the vectors fa 
and f6 have the same T-annihilator. 


7. Find the minimal polynomials and the rational forms of each of the following 
real matrices. 


OF kcal c 0 -1 PAE E 
1 0 0p Oc 1) [ iy eal 
E 0 0 =| 1 č —sin cos 


8. Let T be the linear operator on R? which is represented in the standard ordered 


basis by 
3 —4 —4 
= 3 J 
2 —4 —3 


Find non-zero vectors a, . . . , œ, satisfying the conditions of Theorem 3. 


9. Let A be the real matrix 


1 3 3 
A= 3 1 3 | 
—3 —3 —öő 


Find an invertible 3 X 3 real matrix P such that P~!AP is in rational form. 


10. Let F be a subfield of the complex numbers and let T be the linear operator 
on F* which is represented in the standard ordered basis by the matrix 


20 0 0 


oor 
Nooo 


0 
2 
b 


os 
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Find the characteristic polynomial for T. Consider the cases a = b = 1;a = b = 0; 
« = 0, b = 1. In each of these cases, find the minimal polynomial for T and non- 
zero vectors a, ..., a which satisfy the conditions of Theorem 3. 


1]. Prove that if A and B are 3 X 3 matrices over the field F, a necessary and 
sufficient condition that A and B be similar over F is that they have the same 
characteristic polynomial and the same minimal polynomial. Give an example 
which shows that this is false for 4 X 4 matrices. 


12. Let F be a subfield of the field of complex numbers, and let A and B ben X n 
matrices over F. Prove that if A and B are similar over the field of complex num- 
bers, then they are similar over F. (Hint: Prove that the rational form of A is the 
same whether A is viewed as a matrix over F or a matrix over C; likewise for B.) 


13. Let A be an n X n matrix with complex entries. Prove that if every character- 
istic value of A is real, then A is similar to a matrix with real entries. 


14. Let T be a linear operator on the finite-dimensional space V. Prove that there 
exists a vector a in V with this property. If f is a polynomial and f(T)a = 0, 
then f(T) = 0, (Such a vector «æ is called a separating vector for the algebra of 
polynomials in T.) When T has a cyclic vector, give a direct proof that any cyclic 
vector is a separating vector for the algebra of polynomials in T. 


15. Let F be a subfield of the field of complex numbers, and let A be ann Xn 
matrix over F. Let p be the minimal polynomial for A. If we regard A as a matrix 
over C, then A has a minimal polynomial f as an n X n matrix over C. Use a 
theorem on linear equations to prove » = f. Can you also see how this follows from 
the cyclic decomposition theorem? 


16. Let A be an n X n matrix with real entries such that A? + J = 0. Prove that 
n is even, and if n = 2k, then A is similar over the field of real numbers to a matrix 


of the block form 
[ ] 
I 0 


where J is the k X k identity matrix. 


17. Let T be a linear operator on a finite-dimensional vector space V. Suppose that 
(a) the minimal polynomial for T is a power of an irreducible polynomial; 
(b) the minimal polynomial is equal to the characteristic polynomial. 
Show that no non-trivial T-invariant subspace has a complementary T-invari- 
ant subspace. 


18. If T is a diagonalizable linear operator, then every T-invariant subspace has 
a complementary T-invariant subspace. 


19, Let T be a linear operator on the finite-dimensional space V. Prove that T 
has a cyclic vector if and only if the following is true: Every linear operator U 
which commutes with T is a polynomial in T. 


20. Let V be a finite-dimensional vector space over the field F, and let T be a 
linear operator on V. We ask when it is true that every non-zero vector in V isa 
cyclic vector for T. Prove that this is the case if and only if the characteristic 
polynomial for T is irreducible over F. 
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21. Let A be an n X n matrix with real entries. Let T be the linear operator on R” 
which is represented by A in the standard ordered basis, and let U be the linear 
operator on ©” which is represented by A in the standard ordered basis. Use the 
result of Exercise 20 to prove the following: If the only subspaces invariant under 
T are R” and the zero subspace, then U is diagonalizable. 


7.3. The Jordan Form 


Suppose that N is a nilpotent linear operator on the finite-dimen- 
sional space V. Let us look at the cyclic decomposition for N which we 
obtain from Theorem 3. We have a positive integer r and r non-zero vectors 
ay,...,a@,in V with N-annihilators pi,..., Pr, such that 


V = Zœ; N) e O Zla; N) 
and p;+ı divides p; for i = 1,...,7 — 1. Since N is nilpotent, the minimal 
polynomial is x* for some k < n. Thus each p; is of the form p; = x*, 
and the divisibility condition simply says that 
kı > k > -e > kn 


Of course, kı = k and k, > 1. The companion matrix of x*' is the k; X ki 
matrix 


0 0 0 0 
10 0 0 
(7-24) A;=|0 1 0 o|: 
o0- 10 


Thus Theorem 3 gives us an ordered basis for V in which the matrix of N 
is the direct sum of the elementary nilpotent matrices (7-24), the sizes of 
which decrease as 7 increases. One sees from this that associated with a 
nilpotent n X n matrix is a positive integer r and r positive integers 
kı... , k, such that ki + --- + k, = n and k; > ki}, and these positive 
integers determine the rational form of the matrix, i.e., determine the 
matrix up to similarity. 

Here is one thing we should like to point out about the nilpotent 
operator N above. The positive integer r is precisely the nullity of N; 
in fact, the null space has as a basis the r vectors 


(7-25) N¥-la,. 
For, let «æ be in the null space of N. We write a in the form 
a = fia, + aes + frar 


where f; is a polynomial, the degree of which we may assume is less than 
ki. Since Na = 0, for each 7 we have 
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0 = N( fiai) 
= Nfi(N)a; 
= (afi) ay. 
Thus zf: is divisible by x*‘, and since deg (f;) > k: this means that 
fi = cxt 1 
where c; is some scalar. But then 
az clx la) + ecu + c,(akr—ta,) 


which shows us that the vectors (7-25) form a basis for the null space of N. 
The reader should note that this fact is also quite clear from the matrix 
point of view. 

Now what we wish to do is to combine our findings about nilpotent 
operators or matrices with the primary decomposition theorem of Chapter 
6. The situation is this: Suppose that T is a linear operator on V and that 
the characteristic polynomial for T factors over F as follows: 


f = (x _ C1)4 bce (x ga cy) 


where cı, . . . , ck are distinct elements of F and d; > 1. Then the minimal 
polynomial for T will be 


p = (£ — c) +++ (x — c)" 


where 1 < r; < d;. If W; is the null space of (T — c:J)", then the primary 
decomposition theorem tells us that 


V=Wi@::-OwW, 


and that the operator T; induced on W; by T has minimal polynomial 
(x — c:)". Let N; be the linear operator on W; defined by N: = Ti — cil. 
Then N; is nilpotent and has minimal polynomial z”. On W, T acts like 
N; plus the scalar c; times the identity operator. Suppose we choose a 
basis for the subspace W; corresponding to the cyclic decomposition for 
the nilpotent operator N;. Then the matrix of 7; in this ordered basis will 
be the direct sum of matrices 


c 0 - 0 0 

1 c¢ - 0 0 

(7-26) ; E- 
c 

0 0 oat 1 C 


each with c = c; Furthermore, the sizes of these matrices will decrease 
as one reads from left to right. A matrix of the form (7-26) is called an 
elementary Jordan matrix with characteristic value c. Now if we put 
all the bases for the W; together, we obtain an ordered basis for V. Let 
us describe the matrix A of T in this ordered basis. 
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The matrix A is the direct sum 


AO > 0 

(7-27) ae 
D0, dee Ay 

of matrices Á, . . ., Ax. Each A; is of the form 
JP 0 “+. 0 

dean a 

0 o0 . JË 


where each J$ is an elementary Jordan matrix with characteristic value 
c;. Also, within each A, the sizes of the matrices J decrease as j in- 
creases. An n X n matrix A which satisfies all the conditions described 
so far in this paragraph (for some distinct scalars cı, . . . , Cx) will be said 
to be in Jordan form. 

We have just pointed out that if T is a linear operator for which the 
characteristic polynomial factors completely over the scalar field, then 
there is an ordered basis for V in which T is represented by a matrix which 
is in Jordan form. We should like to show now that this matrix is some- 
thing uniquely associated with T, up to the order in which the charac- 
teristic values of T are written down. In other words, if two matrices are 
in Jordan form and they are similar, then they can differ only in that the 
order of the scalars c; is different. 

The uniqueness we see as follows. Suppose there is some ordered basis 
for V in which T is represented by the Jordan matrix A described in the 
previous paragraph. If A; is a d; X d: matrix, then d; is clearly the multi- 
plicity of c; as a root of the characteristic polynomial for A, or for T. In 
other words, the characteristic polynomial for T is 


f= (z — a)t e (x — ee. 


This shows that cı, . . . , C+ and di, . . . , de are unique, up to the order in 
which we write them. The fact that A is the direct sum of the matrices 
A; gives us a direct sum decomposition V = Wi@ --- @ W, invariant 
under T. Now note that W; must be the null space of (T — c,J)", where 
n = dim V; for, A; — c,J is clearly nilpotent and A; — cI is non-singular 
for 7 = i. So we see that the subspaces W; are unique. If T; is the operator 
induced on W; by T, then the matrix A; is uniquely determined as the 
rational form for (T; — cI). 

Now we wish to make some further observations about the operator 
T and the Jordan matrix A which represents T in some ordered basis. 
We shall list a string of observations: 


(1) Every entry of A not on or immediately below the main diagonal 
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is 0. On the diagonal of A occur the k distinct characteristic values 
C.. -C of T. Also, c; is repeated d; times, where d; is the multiplicity 
of c: as a root of the characteristic polynomial, i.e., d; = dim W,. 

(2) For each 7, the matrix A; is the direct sum of n; elementary 
Jordan matrices J{ with characteristic value c; The number n; is pre- 
cisely the dimension of the space of characteristic vectors associated with 
the characteristic value c; For, n; is the number of elementary nilpotent 
blocks in the rational form for (T; — eJ), and is thus equal to the dimen- 
sion of the null space of (T — c,J). In particular notice that T is diag- 
onalizable if and only if n; = d; for each 7. 

(3) For each i, the first block J{? in the matrix A; is an f; X ri 
matrix, where r; is the multiplicity of c; as a root of the minimal poly- 
nomial for T. This follows from the fact that the minimal polynomial for 
the nilpotent operator (T; — cI) is x". 


Of course we have as usual the straight matrix result. If B is an 
n X n matrix over the field F and if the characteristic polynomial for B 
factors completely over F, then B is similar over F to an n X n matrix 
A in Jordan form, and A is unique up to a rearrangement of the order 
of its characteristic values. We call A the Jordan form of B. 

Also, note that if F is an algebraically closed field, then the above 
remarks apply to every linear operator on a finite-dimensional space over 
F, or to every n X n matrix over F. Thus, for example, every nX n 
matrix over the field of complex numbers is similar to an essentially unique 
matrix in Jordan form. 


EXAMPLE 5. Suppose T is a linear operator on C2. The characteristic 
polynomial for T is either (x — cy)(x — ce) where cı and cz are distinct 
complex numbers, or is (x — c)?. In the former case, T is diagonalizable 
and is represented in some ordered basis by 


f °] 
0 C2 
In the latter case, the minimal polynomial for T may be (x — e), in which 


case T = cI, or may be (x — c)*, in which case T is represented in some 
ordered basis by the matrix 


Thus every 2 X 2 matrix over the field of complex numbers is similar to 
a matrix of one of the two types displayed above, possibly with ci = c2. 


ExamMPLE 6. Let A be the complex 3 X 3 matrix 
2 0 0 
A=ļ|a 2 0 |} 
b c —-l 
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The characteristic polynomial for A is obviously (x — 2)?(x + 1). Either 
this is the minimal polynomial, in which case A is similar to 


2 0 0 
1 2 0 
00 =i 


or the minimal polynomial is (x — 2)(x + 1), in which case A is similar to 


2 0 0 
0 2 0 |: 
00 ~il 


0 0 0 
(A —21)(A+J]) =|3a 0 0 
ac 0 0 


and thus A is similar to a diagonal matrix if and only if a = 0. 


Now 


Exampt_eE 7. Let 


0 
2 
0 
0 


oor ww 


The characteristic polynomial for A is (x — 2)‘. Since A is the direct sum 
of two 2 X 2 matrices, it is clear that the minimal polynomial for A is 
(x — 2)2. Now if a = 0 or if a = 1, then the matrix A is in Jordan form. 
Notice that the two matrices we obtain for a = 0 and a = 1 have the 
same characteristic polynomial and the same minimal polynomial, but 
are not similar. They are not similar because for the first matrix the solu- 
tion space of (A — 2I) has dimension 3, while for the second matrix it 
has dimension 2. 


EXxampLe 8. Linear differential equations with constant coefficients 
(Example 14, Chapter 6) provide a nice illustration of the Jordan form. 
Let ao, . . . , @n-1 be complex numbers and let V be the space of all n times 
differentiable functions f on an interval of the real line which satisfy the 
differential equation 


ay 
dx” 


gr df 
Ot tat tay =0 





+ QnA 


Let D be the differentiation operator. Then V is invariant under D, because 
V is the null space of p(D), where 
p= +: + art ao 


What is the Jordan form for the differentiation operator on V? 
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Let cı, . . . , Cx be the distinct complex roots of p: 
p= (z — a) +++ (e — c)". 


Let V; be the null space of (D — c,I)", that is, the set of solutions to the 
differential equation 


(D — clf = 0. 


Then as we noted in Example 15, Chapter 6 the primary decomposition 
theorem tells us that 


V=Vi0:-::-@OVi.. 


Let N; be the restriction of D — c,J to V;. The Jordan form for the oper- 
ator D (on V) is then determined by the rational forms for the nilpotent 
operators Ni, ..., Nz on the spaces Vi,..., Vi. 

So, what we must know (for various values of c) is the rational form 
for the operator N = (D — cI) on the space V., which consists of the 
solutions of the equation 


(D — cl)f = 0. 


How many elementary nilpotent blocks will there be in the rational form 
for N? The number will be the nullity of N, i.e., the dimension of the 
characteristic space associated with the characteristic value c. That 
dimension is 1, because any function which satisfies the differential 
equation 


Df = of 


is a scalar multiple of the exponential function k(x) = e*. Therefore, the 
operator N (on the space V.) has a cyclic vector. A good choice for a 
cyclic vector is g = 2h: 


g(x) = ate, 
This gives 


Ng = (r — Darh 
Neg =(r— 1th 


The preceding paragraph shows us that the Jordan form for D (on 
the space V) is the direct sum of & elementary Jordan matrices, one for 
each root ci. 


Exercises 


1. Let N; and N: be 3 X 3 nilpotent matrices over the field F. Prove that Ni 
and N3 are similar if and only if they have the same minimal polynomial. 


2. Use the result of Exercise 1 and the Jordan form to prove the following: Let 
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A and B be n X n matrices over the field F which have the same characteristic 
polynomial 

f= (a —a)h--- (x — e)% 
and the same minimal polynomial. If no d; is greater than 3, then A and B are 
similar. 


3. If A isa complex 5 X 5 matrix with characteristic polynomial 
f = (e-2)Me +7) 
and minimal polynomial p = (x — 2)*(z + 7), what is the Jordan form for A? 


4. How many possible Jordan forms are there for a 6 X 6 complex matrix with 
characteristic polynomial (x +- 2)4(a — 1)2? 


5. The differentiation operator on the space of polynomials of degree less than 
or equal to 3 is represented in the ‘natural’ ordered basis by the matrix 


@100 
0 0 2 0j, 
000 8 
000 0 
What is the Jordan form of this matrix? (F a subfield of the complex numbers.) 
6. Let A be the complex matrix 
“20000 07 
12000 0 
-1 0200 0 
01020 of 
1 1 1 1 2 0 
0000 1 -1 


Find the Jordan form for A. 
7. If A isann X n matrix over the field F with characteristic polynomial 
J= (@—a)h--- (x — oe 
what is the trace of A? 
8. Classify up to similarity all 3 X 3 complex matrices A such that A? = 7. 
9. Classify up to similarity all n X n complex matrices A such that A” = I. 


10. Let n be a positive integer, n > 2, and let N be an n X n matrix over the 
field F such that Nn = 0 but N”! Æ 0. Prove that N has no square root, i.e., 
that there is no n X n matrix A such that 4? = N. 


ll. Let Ni and N: be 6 X 6 nilpotent matrices over the field F. Suppose that 
N, and Nz have the same minimal polynomial and the same nullity. Prove that 
N, and Nz» are similar. Show that this is not true for 7 X 7 nilpotent matrices. 


12. Use the result of Exercise 11 and the Jordan form to prove the following: 
Let A and Bben X n matrices over the field F which have the same characteristic 
polynomial 

f = (x = cy)4 Soa (x ne Cy) 
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and the same minimal polynomial. Suppose also that for each 7 the solution spaces 
of (A — cI) and (B — e,I) have the same dimension. If no d: is greater than 6, 
then A and B are similar. 

13. If N isa k X k elementary nilpotent matrix, i.e., N* = 0 but N*~! Æ 0, show 
that N* is similar to N. Now use the Jordan form to prove that every complex 
n X n matrix is similar to its transpose. 


14. What’s wrong with the following proof? If A is a complex n X n matrix 
such that At = — A, then A is 0. (Proof: Let J be the Jordan form of A. Since 


At = —A, Jt = =J. But J is triangular so that J‘ = —J implies that every 
entry of J is zero. Since J = 0 and A is similar to J, we see that A = 0.) (Give 
an example of a nen-zero A such that At = —A.) 


15. If N is a nilpotent 3 X 3 matrix over C, prove that A =I + 4N — }N? 
satisfies A? = I + N, i.e., A is a square root of J + N. Use the binomial series for 
(1 + t)/2 to obtain a similar formula for a square root of J + N, where N is any 
nilpotent n X n matrix over C. 


16. Use the result of Exercise 15 to prove that if c is a non-zero complex number 
and N is a nilpotent complex matrix, then (e7 + N) has a square root. Now use 
the Jordan form to prove that every non-singular complex n X n matrix has a 
square root. 


251 


7.4. Computation of Invariant Factors 


Suppose that A is an n X n matrix with entries in the field F. We 
wish to find a method for computing the invariant factors py... Dr 
which define the rational form for A. Let us begin with the very simple 
case in which A is the companion matrix (7.2) of a monic polynomial 


P = 2” ae Rs H aT H co 


In Section 7.1 we saw that p is both the minimal and the characteristic 
polynomial for the companion matrix A. Now, we want to give a direct 
calculation which shows that p is the characteristic polynomial for A. In 
this case, 


ri 0 0 0 Co 
—1l z 0 0 Cy 
ig foe hae 0 —1 0 Ce 
0 00. E Cae 
0 0 0 —1 zxz +r 


Add x times row n to row (n — 1). This will remove the z in the (n — 1, 
n — 1) place and it will not change the determinant. Then, add x times 
the new row (n — 1) to row (n — 2). Continue successively until all of 
the z’s on the main diagonal have been removed by that process. The 
result is the matrix 
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0 00 >= 0 a+---+ar+a 
—1 00 >œ 0 ami + -ee +ort a 
0 -1 0 >e’ O mt --- tarte 


O> wo oi. O kna ea 
0 00 œ —l z + Cna 


which has the same determinant as x] — A. The upper right-hand entry 
of this matrix is the polynomial p. We clean up the last column by adding 
to it appropriate multiples of the other columns: 


0 00- Op 
“1 00.-> 00 
0-10- 00 


0 00- 0 
0 ee 


Multiply each of the first (n — 1) columns by —1 and then perform 
(n — 1) interchanges of adjacent columns to bring the present column n 
to the first position. The total effect of the 2n — 2 sign changes is to leave 
the determinant unaltered. We obtain the matrix 


poo- 0 
010 --- 0 
(7-28) 001 —. O| 
000- 1 


It is then clear that p = det (zI — A). 

We are going to show that, for any n X n matrix A, there is a suc- 
cession of row and column operations which will transform z — A into 
a matrix much like (7-28), in which the invariant factors of A appear 
down the main diagonal. Let us be completely clear about the operations 
we shall use. 

We shall be concerned with F[z]"*", the collection of m X n matrices 
with entries which are polynomials over the field F. If M is such a matrix, 
an elementary row operation on M is one of the following 


1. multiplication of one row of M by a non-zero scalar in F; 

‘2. replacement of the rth row of M by row r plus f times row s, where 
f is any polynomial over F and r # s; 

3. interchange of two rows of M. 


The inverse operation of an elementary row operation is an elementary 
row operation of the same type. Notice that we could not make such an 
assertion if we allowed non-scalar polynomials in (1). An m X m ele- 
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mentary matrix, that is, an elementary matrix in F[z]”*", is one which 
can be obtained from the m X m identity matrix by means of a single 
elementary row operation. Clearly each elementary row operation on M 
can be effected by multiplying M on the left by a suitable m X m ele- 
mentary matrix; in fact, if e is the operation, then 


e(M) = eI)M. 
Let M, N be matrices in F[x]"*", We say that N is row-equivalent 


to M if N can be obtained from M by a finite succession of elementary 
row operations: 


M =M > M>.. >M =N. 
Evidently N is row-equivalent to M if and only if M is row-equivalent to 


N, so that we may use the terminology ‘M and N are row-equivalent.’ 
If N is row-equivalent to M, then 
N = PM 
where the m X m matrix P is a product of elementary matrices: 
P = Ei- Ey 
In particular, P is an invertible matrix with inverse 
P- = Ey}... Ey}, 
Of course, the inverse of E; comes from the inverse elementary row 
operation. 

All of this is just as it is in the case of matrices with entries in F. It 
parallels the elementary results in Chapter 1. Thus, the next problem 
which suggests itself is to introduce a row-reduced echelon form for poly- 
nomial matrices. Here, we meet a new obstacle. How do we row-reduce 
a matrix? The first step is to single out the leading non-zero entry of row 1 
and to divide every entry of row 1 by that entry. We cannot (necessarily) 
do that when the matrix has polynomial entries. As we shall see in the 
next theorem, we can circumvent this difficulty in certain cases; however, 
there is not any entirely suitable row-reduced form for the general matrix 
in F [x]”*». If we introduce column operations as well and study the type 
of equivalence which results from allowing the use of both types of oper- 
ations, we can obtain a very useful standard form for each matrix. The 
basic tool is the following. 


Lemma. Let M be a matrix in F[x]”%” which has some non-zero entry 
in its first column, and let p be the greatest common divisor of the entries in 
column 1 of M. Then M ts row-equivalent to a matrix N which has 


p 
0 
0 


as tts first column. 


263 


254 


The Rattonal and Jordan Forms Chap. 7 


Proof. We shall prove something more than we have stated. 

We shall show that there is an algorithm for finding N, i.e., a prescription 

which a machine could use to calculate N in a finite number of steps. 
First, we need some notation. 

Let M be any m X n matrix with entries in F[x] which has a non- 


zero first column 
fi 
M, =|: |. 


(Mı) = min deg f; 
fi#0 


p(M:) = g.c.d. (Ju . -s Jm). 
Let j be some index such that deg f; = I(Mı). To be specific, let j be 
the smallest index ï for which deg f; = (M,). Attempt to divide each f; 
by fi: 
(7-30) fi=fgitr, 1:=0 or degr; < degf;. 
For each 7 different from j, replace row 7 of M by row 7 minus g; times 
row j. Multiply row j by the reciprocal of the leading coefficient of f; and 


then interchange rows 7 and 1. The result of all these operations is a matrix 
M' which has for its first column 


Define 


(7-29) 


(7-31) M =|" | 


where f; is the monic polynomial obtained by normalizing f; to have leading 
coefficient 1. We have given a well-defined procedure for associating with 
each M a matrix M’ with these properties. 


(a) M’ is row-equivalent to M. 
(b) p(M1) = p(M)). 
(c) Either (M1) < (Mı) or 
p(M:) 
Mi=| ° 
0 
It is easy to verify (b) and (c) from (7-80) and (7-31). Property (c) 
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an 


is just another way of stating that either there is some 7 such that r; = 0 
and deg r; < deg f; or else r; = 0 for all 7 and f; is (therefore) the greatest 
common divisor of fi,.. . , fm 

The proof of the lemma is now quite simple. We start with the matrix 
M and apply the above procedure to obtain M’. Property (c) tells us that 
either M’ will serve as the matrix N in the lemma or (Mi) < 1(M;). In 
the latter case, we apply the procedure to M’ to obtain the matrix M® = 
(M'Y. If M? is not a suitable N, we form M® = (M®)’, and so on. The 
point is that the strict inequalities 


(My) > (Mi) > ICM) > «-- 


cannot continue for very long. After not more than (Mı) iterations of our 


procedure, we must arrive at a matrix M® which has the properties we 
seek. f 


Theorem 6. Let P be an m X m matrix with entries in the polynomial 
algebra F [x]. The following are equivalent. 


(i) P ts invertible. 

(ii) The determinant of P is a non-zero scalar polynomial. 
(iii) P ts row-equivalent to the m X m identity matriz. 
(iv) P ts a product of elementary matrices. 


Proof. Certainly (i) implies (ii) because the determinant func- 
tion is multiplicative and the only polynomials invertible in F[x] are the 
non-zero scalar ones. As a matter of fact, in Chapter 5 we used the classical 
adjoint to show that (i) and (ii) are equivalent. Our argument here pro- 
vides a different proof that (i) follows from (ii). We shall complete the 
merry-go-round 

(i) > (ii) 
(iv) < (iii). 
The only implication which is not obvious is that (iii) follows from (ii). 


Assume (ii) and consider the first column of P. It contains certain 
polynomials pi, ..., Pm, and 


g.c.d. (Pi, >- Dm) = 1 
because any common divisor of pi,..., Pa must divide (the scalar) det P. 
Apply the previous lemma to P to obtain a matrix 
1 as t Gn 
(7-32 Gms 
a ) = : B 
0 


which is row-equivalent to P. An elementary row operation changes the 
determinant of a matrix by (at most) a non-zero scalar factor. Thus det Q 
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is a non-zero scalar polynomial. Evidently the (m — 1) X (m — 1) 
matrix B in (7-32) has the same determinant as does Q. Therefore, we 
may apply the last lemma to B. If we continue this way for m steps, we 
obtain an upper-triangular matrix 


1 & Am 
Rej? "oe 
00 1 


which is row-equivalent to R. Obviously R is row-equivalent to the m X m 
identity matrix. J 


Corollary. Let M and N be m X n matrices with entries in the poly- 
nomial algebra F [x]. Then N is row-equivalent to M if and only if 


N = PM 


where P is an invertible m X m matrix with entries in F [x]. 


We now define elementary column operations and column- 
equivalence in a manner analogous to row operations and row-equivalence. 
We do not need a new concept of elementary matrix because the class of 
matrices which can be obtained by performing one elementary column 
operation on the identity matrix is the same as the class obtained by 
using a single elementary row operation. 


Definition. The matrix N is equivalent to the matrix M if we can 

pass from M to N by means of a sequence of operations 
M=M7>Mi7>:-:-:-3>M=N 

each of which is an elementary row operation or an elementary column 
operation. 

Theorem 7. Let M and N be m Xn matrices with entries in the 
polynomial algebra F[x]. Then N is equivalent to M if and only if 

N = PMQ 

where P is an invertible matrix in F[x]”™™ and Q is an invertible matrix in 


Ff[x] >. 


Theorem 8. Let A be an n X n matrix with entries in the field F, 


and let Pi, ..., pr be the invariant factors for A. The matrix xI — A is 
equivalent to the n X n diagonal matrix with diagonal entries pi... 5 Pr 
A E 


Proof. There exists an invertible n X n matrix P, with entries 
in F, such that PAP-—! is in rational form, that is, has the block form 
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A, 0 +: 0 
pipas? oit 
Os SOL. ate 


where A; is the companion matrix of the polynomial p;. According to 
Theorem 7, the matrix 


(7-33) P(zI — A)P7! = xI — PAP- 
is equivalent to xI — A. Now 
zI — Ay 0 tee 0 


Gay papel) & Share D 

0 0 +> al — A, 

where the various /’s we have used are identity matrices of appropriate 
sizes. At the beginning of this section, we showed that xI — A; is equiv- 
alent to the matrix 


pi 0 eee 0 
0 | cras 0}. 
Ge 20. wad 


From (7-33) and (7-34) it is then clear that tJ — A is equivalent to a 
diagonal matrix which has the polynomials p; and (n — r) 1’s on its main 
diagonal. By a succession of row and column interchanges, we can arrange 
those diagonal entries in any order we choose, for example: pi, ..., Dr, 


1,...,1. J 


Theorem 8 does not give us an effective way of calculating the ele- 
mentary divisors p,,...,p, because our proof depends upon the cyclic 
decomposition theorem. We shall now give an explicit algorithm for re- 
ducing a polynomial matrix to diagonal form. Theorem 8 suggests that 
we may also arrange that successive elements on the main diagonal divide 
one another. 


Definition. Let N be a matriz in F[x]}™**. We say that N is in (Smith) 
normal form if 

(a) every entry off the main diagonal of N is 0; 

(b) on the main diagonal of N there appear (in order) polynomials 
fı, . . . , fı such that fx divides fry, 1 <k <1 — 1. 


In the definition, the number l is l = min (m, n). The main diagonal 
entries are fe = Nuk =1,...,1. 


Theorem 9. Let M bean m X n matrix with entries in the polynomial 
algebra F[x]. Then M is equivalent to a matrix N which is in normal form. 
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Proof. If M = 0, there is nothing to prove. If M = 0, we shall 
give an algorithm for finding a matrix M’ which is equivalent to M and 
which has the form 


(7-35) M =|. 

0 
where R is an (m — 1) X (n — 1) matrix and fı divides every entry of R. 
We shall then be finished, because we can apply the same procedure to R 
and obtain fo, ete. 

Let (M) be the minimum of the degrees of the non-zero entries of M. 
Find the first column which contains an entry with degree (M) and 
interchange that column with column 1. Call the resulting matrix M®, 
We describe a procedure for finding a matrix of the form 


g O +--+ O 
0 
(7-36) s 
0 
which is equivalent to M. We begin by applying to the matrix M® the 
procedure of the lemma before Theorem 6, a procedure which we shall 
call PL6. There results a matrix 


p a b 
(7-37) mw =| è an 
0 ¢ f 
If the entries a,..., b are all 0, fine. If not, we use the analogue of PL6 


for the first row, a procedure which we might call PL6’. The result is a 
matrix 


q 0 0 
, r / 
(7-38) M® = k : 
bY dq’ a f 
where q is the greatest common divisor of p, a,..., b. In producing M®, 


we may or may not have disturbed the nice form of column 1. If we did, 
we can apply PL6 once again. Here is the point. In not more than (M) 
steps: 

mo? moa S mo FES ow. yO 
we must arrive at a matrix M“ which has the form (7-36), because at 
each successive step we have l(M“t)) < (M®). We name the process 
which we have just defined P7-36: 


Mo EUS Mo, 
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In (7-36), the polynomial g may or may not divide every entry of S. 
If it does not, find the first column which has an entry not divisible by g 
and add that column to column 1. The new first column contains both g 
and an entry gh + r where r ~ 0 and deg r < deg g. Apply process P7-36 
and the result will be another matrix of the form (7-36), where the degree 
of the corresponding g has decreased. 

It should now be obvious that in a finite number of steps we will 
obtain (7-35), i.e., we will reach a matrix of the form (7-36) where the 
degree of g cannot be further reduced. Jf 


We want to show that the normal form associated with a matrix M 
is unique. Two things we have seen provide clues as to how the poly- 
nomials f,,...,/; in Theorem 9 are uniquely determined by M. First, 
elementary row and column operations do not change the determinant 
of a square matrix by more than a non-zero scalar factor. Second, ele- 
mentary row and column operations do not change the greatest common 
divisor of the entries of a matrix. 


Definition. Let M be an m Xn matrix with entries in F[x]. If 
1<k < min (m, n), we define 5:(M) to be the greatest common divisor of 
the determinants of all k X k submatrices of M. 


Recall that a k X k submatrix of M is one obtained by deleting some 
m — k rows and some n — k columns of M. In other words, we select 
certain k-tuples 


I = (u,..., 4%), ILU- Culm 
J = (Jn. ©., Je; LLA Luan 


and look at the matrix formed using those rows and columns of M. We 
are interested in the determinants 


May, ways Mij 
(7-39) Dr (M) = det| : T 
May, eal Mai 


The polynomial 6,(M) is the greatest common divisor of the polynomials 
Dy,;(M), as I and J range over the possible k-tuples. 


Theorem 10. If M and N are equivalent m X n matrices with entries 
in F [x], then 


(7-40) dx(M) = 6 (N), 1 <k < min (m, n). 


Proof. It will suffice to show that a single elementary row oper- 
ation e does not change ô+. Since the inverse of e is also an elementary row 
operation, it will suffice to show this: If a polynomial f divides every 
Dz,(M), then f divides Dr,z(e(M)) for all k-tuples J and J. 
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Since we are considering a row operation, let a,..., a,, be the rows 
of M and let us employ the notation 


Dilain sey Qin) = Dr, (M). 


Given I and J, what is the relation between D;,;(M) and Dr,s(e(M))? 
Consider the three types of operations e: 


(a) multiplication of row r by a non-zero scalar c; 
(b) replacement of row r by row r plus g times row s, r # s; 
(c) interchange of rows r and s, r ¥ s. 


Forget about type (c) operations for the moment, and concentrate 
on types (a) and (b), which change only row r. If r is not one of the indices 
tay. .., Te, then 

Dr,s(e(M)) = Dr, (M). 


If r is among the indices ù, .. . , tx then in the two cases we have 
(a) Dr,s(e(M)) = Dy(aiy... 5 Coy, . o, Qi) 
= CD jai, . . . y Ore +» Qi) 
= Dr (M); 


(b) Dr,s(e(M)) 


Dilain... , Ar + Jas -y Ai) 
= Dr, (M) + gDsloty . . -p Asye oo Ai). 


For type (a) operations, it is clear that any f which divides Dr,s(M) 
also divides Dr,s(e(M)). For the case of a type (c) operation, notice that 


Dye, oy bey en ay ay) = 0, ifs = a; for some j 
Dylon... Gey. 6 My) = £&Dr (M), if s Æ ij for all J. 
The J’ in the last equation is the k-tuple (in... , S, ..., t) arranged in 


increasing order. It should now be apparent that, if f dividesevery Dr,s(M), 
then f divides every Dy,;(e(M)). 

Operations of type (c) can be taken care of by roughly the same 
argument or by using the fact that such an operation can be effected by 
a sequence of operations of types (a) and (b). f 


Corollary. Each matrix M in F[x]™*" is equivalent to precisely one 
matrix N which is in normal form. The polynomials fı, . . . , fı which occur 
on the main diagonal of N are 


_ _6(M) 
aa ôk M} 





1 <k < min (m,n) 


where, for convenience, we define &(M) = 1. 


Proof. If N is in normal form with diagonal entries fi,... fù 
it is quite easy to see that 


N) = foel A 
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Of course, we call the matrix N in the last corollary the normal form 
of M. The polynomials fı, ..., fa are often called the invariant factors 
of M. 

Suppose that A isan n X n matrix with entries in F, and let pi,. .. , Dr 
be the invariant factors for A. We now see that the normal form of the 
matrix zI — A has diagonal entries 1, 1,...,1,p,,...,p,. The last 
corollary tells us what pı, . . . , p, are, in terms of submatrices of xI — A. 
The number n — ris the largest k such that (xI — A) = 1. The minimal 
polynomial pı is the characteristic polynomial for A divided by the greatest 
common divisor of the determinants of all (n — 1) X (n — 1) submatrices 
of zI — A, etc. 


Exercises 


1. True or false? Every matrix in F[z]"*" is row-equivalent to an upper-triangular 
matrix. 


2. Let T be a linear operator on a finite-dimensional vector space and let A be 
the matrix of T in some ordered basis. Then T has a cyclic vector if and only if 
the determinants of the (n — 1) X (n — 1) submatrices of zI — A are relatively 
prime. 


3. Let A be an n X n matrix with entries in the field F and let f,,..., fn be the 
diagonal entries of the normal form of zI — A. For which matrices A is fı ¥ 1? 


4. Construct a linear operator T with minimal polynomial 22(z — 1)? and charac- 
teristic polynomial x3(z — 1)4. Describe the primary decomposition of the vector 
space under T and find the projections on the primary components. Find a basis 
in which the matrix of T is in Jordan form. Also find an explicit direct sum decom- 
position of the space into T-cyclic subspaces as in Theorem 3 and give the invariant 
factors. 


5. Let T be the linear operator on R® which is represented in the standard 
basis by the matrix 


l1 l l 1 111 1 
0 0 0 0 000 1 
o 0 0 0 000 -1 
ao f OO A. De 0s O00: a 
0 0 0 1 100 0 
O 1 1 1 110 1 
0 -1 -1 -1 -1 01 -1 
0 0 0 0 000 0 


(a) Find the characteristic polynomial and the invariant factors. 

(b) Find the primary decomposition of R8 under T and the projections on 
the primary components. Find cyclic decompositions of each primary component 
as in Theorem 3. 
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(c) Find the Jordan form of A. 

(d) Find a direct-sum decomposition of R? into T-cyclic subspaces as in 
Theorem 3. (Hint: One way to do this is to use the results in (b) and an appropriate 
generalization of the ideas discussed in Example 4.) 


7.5. Summary; Semi-Simple Operators 


In the last two chapters, we have been dealing with a single linear 
operator T on a finite-dimensional vector space V. The program has been 
to decompose T into a direct sum of linear operators of an elementary 
nature, for the purpose of gaining detailed information about how T 
‘operates’ on the space V. Let us review briefly where we stand. 

We began to study T by means of characteristic values and charac- 
teristic vectors. We introduced diagonalizable operators, the operators 
which can be completely described in terms of characteristic values and 
vectors. We then observed that T might not have a single characteristic 
vector. Even in the case of an algebraically closed scalar field, when every 
linear operator does have at least one characteristic vector, we noted that 
the characteristic vectors of T need not span the space. 

We then proved the cyclic decomposition theorem, expressing any 
linear operator as the direct sum of operators with a cyclic vector, with 
no assumption about the scalar field. If U is a linear operator with a cyclic 


vector, there is a basis {a1,..., an} with 
Uaj = ajy, j=1,...,n— 1 
Uan = = CoQ — Cidg — *** — Cnn. 


The action of U on this basis is then to shift each a; to the next vector 
a;41, except that Ua, is some prescribed linear combination of the vectors 
in the basis. Since the general linear operator T is the direct sum of a 
finite number of such operators U, we obtained an explicit and reasonably 
elementary description of the action of T. 

We next applied the cyclic decomposition theorem to nilpotent 
operators. For the case of an algebraically closed scalar field, we combined 
this with the primary decomposition theorem to obtain the Jordan form. 
The Jordan form gives a basis {a1,..., æn} for the space V such that, 
for each J, either Ta; is a scalar multiple of a; or Ta; = ca; + ajy1. Such 
a basis certainly describes the action of T in an explicit and elementary 
manner. 

The importance of the rational form (or the Jordan form) derives 
from the fact that it exists, rather than from the fact that it can be com- 
puted in specific cases. Of course, if one is given a specific linear operator 
T and can compute its cyclic or Jordan form, that is the thing to do; 
for, having such a form, one can reel off vast amounts of information 
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about T. Two different types of difficulties arise in the computation of 
such standard forms. One difficulty is, of course, the length of the com- 
putations. The other difficulty is that there may not be any method for 
doing the computations, even if one has the necessary time and patience. 
The second difficulty arises in, say, trying to find the Jordan form of a 
complex matrix. There simply is no well-defined method for factoring the 
characteristic polynomial, and thus one is stopped at the outset. The 
rational form does not suffer from this difficulty. As we showed in Section 
7.4, there is a well-defined method for finding the rational form of a given 
n X n matrix; however, such computations are usually extremely lengthy. 

In our summary of the results of these last two chapters, we have not 
yet mentioned one of the theorems which we proved. This is the theorem 
which states that if T is a linear operator on a finite-dimensional vector 
space over an algebraically closed field, then T is uniquely expressible as 
the sum of a diagonalizable operator and a nilpotent operator which 
commute. This was proved from the primary decomposition theorem and 
certain information about diagonalizable operators. It is not as deep a 
theorem as the cyclic decomposition theorem or the existence of the 
Jordan form, but it does have important and useful applications in certain 
parts of mathematics. In concluding this chapter, we shall prove an 
analogous theorem, without assuming that the scalar field is algebraically 
closed. We begin by defining the operators which will play the role of the 
diagonalizable operators. 


Definition. Let V be a finite-dimensional vector space over the field F, 
and let T be a linear operator on V. We say that T is semi-simple if every 
T-invariant subspace has a complementary T-invariant subspace. 


What we are about to prove is that, with some restriction on the 
field F, every linear operator T is uniquely expressible in the form T = 
S + N, where S is semi-simple, N is nilpotent, and SN = NS. First, 
we are going to characterize semi-simple operators by means of their 
minimal polynomials, and this characterization will show us that, when F 
is algebraically closed, an operator is semi-simple if and only if it is 
diagonalizable. 


Lemma. Let T be a linear operator on the finite-dimensional vector 
space V, and let V = Wi@® --: O Wx be the primary decomposition for T. 
In other words, if p is the minimal polynomial for T and p = pi' --: pk is 
the prime factorization of p, then W; is the null space of p;(T)Ti. Let W be 
any subspace of V which is invariant under T. Then 


W= (WOW) O--- (WAW) 


Proof. For the proof we need to recall a corollary to our proof 
of the primary decomposition theorem in Section 6.8. If £1,..., Erk are 
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the projections associated with the decomposition V = W:@ --- @ W,, 
then each £; is a polynomial in T. That is, there are polynomials hı, . . . , hx 
such that E; = h,(T). 

Now let W be a subspace which is invariant under T. If @ is any 
vector in W, then a = a + --- + ax, where a; isin W;. Nowa; = Eja = 
h,(T)a, and since W is invariant under T, each a; is also in W. Thus each 
vector a in W is of the form a = a + --- + ay, where a; is in the inter- 
section W N W;. This expression is unique, since V = Wi D- OW 
Therefore 


W=WOW)O---OWOW,). I 


Lemma. Let T be a linear operator on V, and suppose that the minimal 
polynomial for T is irreducible over the scalar field F. Then T is semi-simple. 


Proof. Let W be a subspace of V which is invariant under T. 
We must prove that W has a complementary T-invariant subspace. 
According to a corollary of Theorem 3, it will suffice to prove that if f is 
a polynomial and @ is a vector in V such that f(T)8 is in W, then there is 
a vector a in W with f(T)@ = f(T)a. So suppose @ isin V and f is a poly- 
nomial such that f(T) is in W. If f(T) = 0, we let œ = 0 and then a is a 
vector in W with f(T) = f(T)a. If f(T)B # 0, the polynomial f is not 
divisible by the minimal polynomial p of the operator T. Since p is prime, 
this means that f and p are relatively prime, and there exist polynomials 
g and h such that fg + ph = 1. Because p(T) = 0, we then have 
f(T)g(T) = I. From this it follows that the vector 8 must itself be in the 
subspace W;; for 

B = g(T)f(T)B 
= 9(T)(f(T)8) 

while f(T)8 is in W and W is invariant under T. Takea = 6. J 


Theorem 11. Let T be a linear operator on the finite-dimenstonal vector 
space V. A necessary and sufficient condition that T be semi-simple is that 
the minimal polynomial p for T be of the form p = pı -+*+ Px, Where pı, . . . , Px 
are distinct irreducible polynomials over the scalar field F. 


Proof. Suppose T is semi-simple. We shall show that no irre- 
ducible polynomial is repeated in the prime factorization of the minimal 
polynomial p. Suppose the contrary. Then there is some non-scalar monic 
polynomial g such that g? divides p. Let W be the null space of the oper- 
ator g(T). Then W is invariant under T. Now p = g?h for some poly- 
nomial h. Since g is not a scalar polynomial, the operator g(T)A(T) is not 
the zero operator, and there is some vector £ in V such that g(T)h(T)8 = 0, 
i.e., (gh)B # 0. Now (gh)g is in the subspace W, since g(gh8) = g?hB = 
pb = 0. But there is no vector a in W such that ghg = gha; for, if œ is in W 


(gh)a = (hg)a = h(ga) = h(0) = 0. 
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Thus, W cannot have a complementary T-invariant subspace, contra- 
dicting the hypothesis that T is semi-simple. 

Now suppose the prime factorization of p is p = pi --- Pa, where 
Pu . - -, De are distinct irreducible (non-scalar) monic polynomials. Let 


W be a subspace of V which is invariant under T. We shall prove that W` 


has a complementary T-invariant subspace. Let V = Wi@:-- QW; 
be the primary decomposition for T, i.e., let W; be the null space of p;(T). 
Let T; be the linear operator induced on W; by T, so that the minimal 
polynomial for 7’; is the prime p;. Now W NQ W; is a subspace of W; which 
is invariant under T; (or under T). By the last lemma, there is a subspace 
V; of W; such that W; = (W AW; @ V; and J; is invariant under T; 
(and hence under T). Then we have 


V=Wi@::- QW: 
=WOAW)OUO:::OWO Wr) O Vi 
=(WAW)+::-+WOW)OVUO:-::OVz. 


By the first lemma above, W = (W1W1)®---@®(W A W}, so that 
if W’=Vi@®---@vVi, then V = W@W’ and W’ is invariant under 
T. | 


Corollary. If T is a linear operator on a finite-dimensional vector space 
over an algebraically closed field, then T ts semi-simple if and only if T is 
diagonalizable. 


Proof. If the scalar field F is algebraically closed, the monic 
primes over F are the polynomials x — c. In this case, T is semi-simple 
if and only if the minimal polynomial for T is p = (x — c) +++ (£ — cr), 
where ci,..., ¢; are distinct elements of F. This is precisely the criterion 
for T to be diagonalizable, which we established in Chapter 6. Jf 


We should point out that T is semi-simple if and only if there is some 
polynomial f, which is a product of distinct primes, such that f(T) = 0. 
This is only superficially different from the condition that the minimal 
polynomial be a product of distinct primes. 

We turn now to expressing a linear operator as the sum of a semi- 
simple operator and a nilpotent operator which commute. In this, we 
shall restrict the scalar field to a subfield of the complex numbers. The 
informed reader will see that what is important is that the field F be a 
field of characteristic zero, that is, that for each positive integer n the 
sum 1 + --- + 1 (n times) in F should not be 0. For a polynomial f over 
F, we denote by f® the kth formal derivative of f. In other words, 
{® = D*f, where D is the differentiation operator on the space of poly- 
nomials. If g is another polynomial, f(g) denotes the result of substituting 
g inf, i.e., the polynomial obtained by applying f to the element g in the 
linear algebra F [x]. 
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Lemma (Taylor’s Formula). Let F be a field of characteristic zero 
and let g and h be polynomials over F. If f is any polynomial over F with 
deg f < n, then 


f(g) = £h) + £(h)(g — h) + og 


fŒ (h) 


n! 


Shae ree (g ~ h)”. 


Proof. What we are a is a generalized Taylor formula. The 
reader is probably used to seeing the special case in which h = c, a scalar 
polynomial, and g = x. Then the formula says 


f = fæ) = Fe) + fP ele — c) 
+5 9 





(x ott EO — a. 


The proof of the general formula is just an application of the binomial 
theorem 


(a + b)* = ak + kab + Me Qh ADE ats bP, 


For the reader should see that, since substitution and differentiation are 
linear processes, one need only prove the formula when f = z*. The for- 


n 
mula for f = E ext follows by a linear combination. In the case f = 2* 
k=0 


with k < n, the formula says 


gt = We + klg — h) + ECTP wg — mye ts + gh 





which is just the binomial expansion of 
=[ht+(g—A)}*. I 


Lemma. Let F be a subfield of the complex numbers, let f be a poly- 
nomial over F, and let f’ be the derivative of f. The following are equivalent: 


(a) f is the product of distinct polynomials irreducible over F. 
(b) f and f' are relatively prime. 
(c) Asa polynomial with complex coefficients, f has no repeated root. 


Proof. Let us first prove that (a) and (b) are equivalent state- 
ments about f. Suppose in the prime factorization of f over the field F that 
some (non-scalar) prime polynomial p is repeated. Then f = p*h for some 
hin F[z]. Then 


F = ph’ + 2pp'h 
and p is also a divisor of f’. Hence f and f’ are not relatively prime. We 
conclude that (b) implies (a). 
Now suppose f = pı ++ pz, where pi,..., px are distinct non-scalar 
irreducible polynomials over F. Let f; = f/p;. Then 


f = pifi + Pafe + +++ + vite. 
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Let p be a prime polynomial which divides both f and f’. Then p = p; for 
some 1. Now p; divides f; for 7 Æ i, and since p; also divides 


k 
l= È pif; 
j=1 


we see that p; must divide pif; Therefore p; divides either f: or pı. But p: 
does not divide f; since pı, . . . , pa are distinct. So p; divides p, This is 
not possible, since p; has degree one less than the degree of p:. We con- 
clude that no prime divides both f and f’, or that (f, f) = 1. 

To see that statement (c) is equivalent to (a) and (b), we need only 
observe the following: Suppose f and g are polynomials over F, a subfield 
of the complex numbers. We may also regard f and g as polynomials with 
complex coefficients. The statement that f and g are relatively prime as 
polynomials over F is equivalent to the statement that f and g are rela- 
tively prime as polynomials over the field of complex numbers. We leave 
the proof of this as an exercise. We use this fact with g = f’. Note that 
(c) is just (a) when f is regarded as a polynomial over the field of complex 
numbers. Thus (b) and (c) are equivalent, by the same argument that 
we used above. ff 


We can now prove a theorem which makes the relation between semi- 
simple operators and diagonalizable operators even more apparent. 


Theorem 12. Let F be a subfield of the field of complex numbers, let V 
be a finite-dimensional vector space over F, and let T be a linear operator on 
V. Let & be an ordered basis for V and let A be the matrix of T in the ordered 
basis @. Then T is semi-simple if and only if the matrix A is similar over the 
field of complex numbers to a diagonal matrix. 


Proof. Let p be the minimal polynomial for T. According to 
Theorem 11, T is semi-simple if and only if p = pi --+ px Where pi, . . . , Dé 
are distinct irreducible polynomials over F. By the last lemma, we see 
that T is semi-simple if and only if p has no repeated complex root. 

Now p is also the minimal polynomial for the matrix A. We know 
that A is similar over the field of complex numbers to a diagonal matrix 
if and only if its minimal polynomial has no repeated complex root. This 
proves the theorem. J 


Theorem 13. Let F be a subfield of the field of complex numbers, let V 
be a finite-dimensional vector space over F, and let T be a linear operator on V. 


There is a semi-simple operator S on V and a nilpotent operator N on V such 
that 


(i) T=S+N; 
(ii) SN = NS. 
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Furthermore, the semi-simple S and nilpotent N satisfying (i) and (ii) are 
unique, and each is a polynomial in T. 


Proof. Let pi' --- pe be the prime factorization of the minimal 
polynomial for T, and letf = pı -+ px. Let r be the greatest of the positive 
integers 71,..., 7% Then the polynomial f is a product of distinct primes, 
f” is divisible by the minimal polynomial for T, and so 


Ty = 0 


We are going to construct a sequence of polynomials: go, 91, ga) - - - 
such that 


(2 - È ot) 
7=0 
is divisible by f"+!,n = 0,1, 2, . . . . We take go = O and then f(z — gof?) = 
f(x) = f is divisible by f. Suppose we have chosen go, . . . , gn—1- Let 
n=l 
h=z- Z gfi 
j=0 
so that, by assumption, f(A) is divisible by f”. We want to choose g, so that 
S(h — gaf") 
is divisible by f”+!. We apply the general Taylor formula and obtain 
Fh — gaf”) = fh) — ff h) + fr 


where b is some polynomial. By assumption f(h) = qf”. Thus, we see that 
to have f(h — gaf”) divisible by f”+! we need only choose g, in such a way 
that (q — gaf’) is divisible by f. This can be done, because f has no re- 
peated prime factors and so f and f’ are relatively prime. If a and e are 
polynomials such that af + ef’ = 1, and if we let g, = eg, then q — gnf” 
is divisible by f. 

Now we have a sequence go, g1,... such that ft! divides 


i(z — > wi). Let us take n = r — 1 and then since f(T)" = 
j=e 


=l 
I(T =Z, IDT) =0 
Let 
r—-1 r—l1 
N= Z gD = E a TAT). 


Since 2 gifi is divisible by f, we see that N” = 0 and N is nilpotent. Let 
J= 


S=T—N. Then S) =f(T — N) =0. Since f has distinct prime 
factors, S is semi-simple. 

Now we have T = S +N where S is semi-simple, N is nilpotent, 
and each is a polynomial in T. To prove the uniqueness statement, we 
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shall pass from the scalar field F to the field of complex numbers. Let @ 
be some ordered basis for the space V. Then we have 


[T]a = [S]e + [Ve 


while [S]e is diagonalizable over the complex numbers and [N]ķ is nil- 
potent. This diagonalizable matrix and nilpotent matrix which commute 
are uniquely determined, as we have shown in Chapter 6. fj 


Exercises 


1. If N is a nilpotent linear operator on V, show that for any polynomial f the 
semi-simple part of f(N) is a scalar multiple of the identity operator (F a subfield 
of C). 


2. Let F be a subfield of the complex numbers, V a finite-dimensional vector 
space over F, and T a semi-simple linear operator on V. If f is any polynomial 
over F, prove that f(T) is semi-simple. 


3. Let T be a linear operator on a finite-dimensional space over a subfield of C. 
Prove that T is semi-simple if and only if the following is true: If f is a polynomial 
and f(T) is nilpotent, then f(T) = 0. 


269 
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Spaces 


8.1. Inner Products 


Throughout this chapter we consider only real or complex vector 
spaces, that is, vector spaces over the field of real numbers or the field of 
complex numbers. Our main object is to study vector spaces in which it 
makes sense to speak of the ‘length’ of a vector and of the ‘angle’ between 
two vectors. We shall do this by studying a certain type of scalar-valued 
function on pairs of vectors, known as an inner product. One example of 
an inner product is the scalar or dot product of vectors in R’. The scalar 
product of the vectors 


ao = (£u %, 23) and B= (Yas Y2, Ys) 


in Kè is the real number 


(al) = Lyi + Lae + Tas. 


Geometrically, this dot product is the product of the length of a, the 
length of 8, and the cosine of the angle between a and £. It is therefore 
possible to define the geometric concepts of ‘length’ and ‘angle’ in R? by 
means of the algebraically defined scalar product. 

An inner product on a vector space is a function with properties 
similar to the dot product in R’, and in terms of such an inner product 
one can also define ‘length’ and ‘angle.’ Our comments about the general 
notion of angle will be restricted to the concept of perpendicularity (or 
orthogonality) of vectors. In this first section we shall say what an inner 
product is, consider some particular examples, and establish a few basic 
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properties of inner products. Then we turn to the task of discussing length 
and orthogonality. 


Definition. Let F be the field of real numbers or the field of complex 
numbers, and V a vector space over F. An inner product on V is a function 
which assigns to each ordered pair of vectors a, B in V a scalar (alB) in F in 
such a way that for all a, B, y in V and all scalars c 

(a) (a + Bly) = (aly) + (Bly); 

(b) (cal8) = c(alb); 

(c) (Bla) = (alb), the bar denoting complex conjugation; 

(d) (ala) > 0 if a = 0. 


It should be observed that conditions (a), (b), and (c) imply that 
(e) (alc + y) = Eal) + (aly). 
One other point should be made. When F is the field R of real numbers, 
the complex conjugates appearing in (c) and (e) are superfluous; however, 


in the complex case they are necessary for the consistency of the condi- 
tions. Without these complex conjugates, we would have the contradiction: 


(ala) >0O and (ialie) = —1(ale) > 0. 
In the examples that follow and throughout the chapter, F is either 


the field of real numbers or the field of complex numbers. 


ExamPLE 1. On F” there is an inner product which we call the 


standard inner product. It is defined on a = (z,...,2%n) and B = 
(Yi, e...) Yn) by 
(8-1) (alb) = È x9; 

Fi 


When F = R, this may also be written 
(alb) = 2 Bii 


In the real case, the standard inner product is often called the dot or 
scalar product and denoted by a - 6. 


EXAMPLE 2. For a = (a, 2) and 8 = (Yi, Y2) in R?, let 
(alb) = ty — Tyi — Liye + 4aryo. 


Since (ala) = (zı — 22)? + 323, it follows that (ala) > 0 if a ¥ 0. Condi- 
tions (a), (b), and (c) of the definition are easily verified. 


EXAMPLE 3. Let V be F", the space of all n X n matrices over F. 
Then V is isomorphic to F” in a natural way. It therefore follows from 
Example 1 that the equation 


(A|B) = Z AnBs 
Jil 
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defines an inner product on V. Furthermore, if we introduce the conjugate 


transpose matrix B*, where Bi; = By,, we may express this inner product 
on F** in terms of the trace function: 


(A|B) = tr (AB*) = tr (B*A). 
For 
tr (AB*) = Z (AB*);; 


= X E AaB 
j k 


= X Z Aaby 
j k 


EXAMPLE 4. Let F"*! be the space of n X 1 (column) matrices over 
F, and let Q be an n X n invertible matrix over F. For X, Y in F™™! set 


(XIY) = Y*Q*QX. 


We are identifying the 1 X 1 matrix on the right with its single entry. 
When Q is the identity matrix, this inner product is essentially the same 
as that in Example 1; we call it the standard inner product on F*™!, 
The reader should note that the terminology ‘standard inner product’ is 
used in two special contexts. For a general finite-dimensional vector space 
over F, there is no obvious inner product that one may call standard. 


Example 5. Let V be the vector space of all continuous complex- 
valued functions on the unit interval, 0 < ¢ < 1. Let 


(fla) = f SOTO a 


The reader is probably more familiar with the space of real-valued con- 
tinuous functions on the unit interval, and for this space the complex 
conjugate on g may be omitted. 


EXAampPLeE 6. This is really a whole class of examples. One may con- 
struct new inner products from a given one by the following method. 
Let V and W be vector spaces over F and suppose ( | ) is an inner product 
on W. If T is a non-singular linear transformation from V into W, then 
the equation 


pr(, 8) = (Ta|TB) 


defines an inner product pr on V. The inner product in Example 4 is a 
special case of this situation. The following are also special cases. 


(a) Let V be a finite-dimensional vector space, and let 


G = {ay,..., en} 
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be an ordered basis for V. Let e1,..., en be the standard basis vectors in 
F”, and let T be the linear transformation from V into F” such that Ta; = 
eJ = 1,...,n. In other words, let T be the ‘natural’ isomorphism of V 
onto F” that is determined by @. If we take the standard inner product 
on F”, then 


pr(D tjai D Yar) = LD xf; 
j k j=l 


Thus, for any basis for V there is an inner product on V with the property 
(a,|0%) = 5%; in fact, it is easy to show that there is exactly one such 
inner product. Later we shall show that every inner product on V is 
determined by some basis @ in the above manner. 

(b) We look again at Example 5 and take V = W, the space of 
continuous functions on the unit interval. Let T be the linear operator 
‘multiplication by t; that is, (T(t) = f(t), 0 << 1. It is easy to see 
that T is linear. Also T is non-singular; for suppose Tf = 0. Then if(t) = 0 
for 0 <t <1; hence f(t) = 0 for t > 0. Since f is continuous, we have 
J(0) = 0 as well, or f = 0. Now using the inner product of Example 5, 
we construct a new inner product on V by setting 


prli, g) = (TNO at 
= Í, ! SOJO dt. 


We turn now to some general observations about inner products. 
Suppose V is a complex vector space with an inner product. Then for all 
a,Bin V 

(al) = Re (a|) + i Im (al) 


where Re (a|) and Im (a|) are the real and imaginary parts of the 
complex number (a/8). If z is a complex number, then Im (2) = Re (—22). 
It follows that 


Im (a|8) = Re [—7(al8)] = Re (alib). 


Thus the inner product is completely determined by its ‘real part’ in 
accordance with 


(8-2) (a|8) = Re (als) + i Re (ali). 


Occasionally it is very useful to know that an inner product on a real 
or complex vector space is determined by another function, the so-called 
quadratic form determined by the inner product. To define it, we first 
denote the positive square root of (ala) by |la||; ||a|| is called the nerm 
of a with respect to the inner product. By looking at the standard inner 
products in R1, C1, R?, and R?, the reader should be able to convince him- 
self that it is appropriate to think of the norm of a as the ‘length’ or 
‘magnitude’ of a. The quadratic form determined by the inner product 
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is the function that assigns to each vector a the scalar |a||?. It follows 
from the properties of the inner product that 

lle + $l]? = Ilall? + 2 Re (als) + ISl? 


for all vectors a and 8. Thus in the real case 
1 1 
(8-3) (als) = z lle + Bll? — z lla — Bll 


In the complex case we use (8-2) to obtain the more complicated expression 
1 zal a f aig EL. “ails 
(8-4) (a8) = q lle + BI? — 3 lla — ll? + alle + ¿l|? — g lle — ill? 


Equations (8-3) and (8-4) are called the polarization identities. Note 
that (8-4) may also be written as follows: 


bend . 
(ala) = 5 È, i Ile + el 


The properties obtained above hold for any inner product on a real 
or complex vector space V, regardless of its dimension. We turn now to 
the case in which V is finite-dimensional. As one might guess, an inner 
product on a finite-dimensional space may always be described in terms 
of an ordered basis by means of a matrix. 

Suppose that V is finite-dimensional, that 


BG = {ay ..., An} 


is an ordered basis for V, and that we are given a particular inner product 
on V; we shall show that the inner product is completely determined by 
the values 


(8-5) Giz = (arla;) 
it assumes on pairs of vectors in @. If a = È tror and 8 = Ð y;ja;, then 
3 
(a|8) = (2 nox|8) 
= Z xy(onl8) 
= z Tr D Gi (owl ers) 
7 
= X GjGprte 
dik 
= Y*GX 


where X, Y are the coordinate matrices of a, 6 in the ordered basis @, 
and G is the matrix with entries Gi = (a,|a;). We call G the matrix 
of the inner product in the ordered basis @. It follows from (8-5) 
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that G is hermitian, i.e., that Œ = G*; however, G is a rather special kind 
of hermitian matrix. For G must satisfy the additional condition 


(8-6) X*GX > 0, X #0. 


In particular, Œ must be invertible. For otherwise there exists an X ¥ 0 
such that GX = 0, and for any such X, (8-6) is impossible. More explicitly, 
(8-6) says that for any scalars xı, . . . , 2 not all of which are 0 


(8-7) z Gite > 0, 
dy 


From this we see immediately that each diagonal entry of G must be 
positive; however, this condition on the diagonal entries is by no means 
sufficient to insure the validity of (8-6). Sufficient conditions for the 
validity of (8-6) will be given later. 

The above process is reversible; that is, if Gis any n X n matrix over 
F which satisfies (8-6) and the condition G = G*, then G is the matrix in 
the ordered basis @ of an inner product on V. This inner product is given 
by the equation 


(als) = Y*GX 


where X and Y are the coordinate matrices of æ and 8 in the ordered 
basis GB. 


Exercises 


1. Let V be a vector space and ( | ) an inner product on V. 
(a) Show that (0|8) = O forall 8 in V. 
(b) Show that if (a|8) = 0 for all B in V, then a = 0. 


2. Let V be a vector space over F. Show that the sum of two inner products 
on V is an inner product on V. Is the difference of two inner products an inner 
product? Show that a positive multiple of an inner product is an inner product. 


3. Describe explicitly all inner products on R! and on C?, 
4. Verify that the standard inner product on F* is an inner product. 
5. Let (|) be the standard inner product on R*. 
(a) Let a = (1, 2), 8 = (—1, 1). If y is a vector such that (aly) = —1 and 
(Bly) = 3, find y. 
(b) Show that for any @ in R? we have a = (ala)e + (alec) er 
6. Let( | ) be the standard inner product on R?, and let T be the linear operator 
T (2x1, £2) = (—2%2, 41). Now T is ‘rotation through 90° and has the property 
that (a@|Ta) = 0 for all @ in R?. Find all inner products [ | ] on R? such that 
{a|Ta] = 0 for each a. 
7. Let ( | ) be the standard inner product on C®. Prove that there is no non- 
zero linear operator on C? such that (a|Ta) = 0 for every œ in C?. Generalize. 
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8. Let A be a 2 X 2 matrix with real entries. For X, Y in R™! let 
fa(X, Y) = Y'AX. 


Show that fa is an inner product on R*! if and only if A = A‘, Au > 0, An > 0, 
and det A > 0. 


9. Let V be a real or complex vector space with an inner product. Show that the 
quadratic form determined by the inner product satisfies the parallelogram law 


\Joe + Bll? + Ilæ — Bll? = 2lher||? + 2416]? 


10. Let ( | ) be the inner product on R? defined in Example 2, and let @ be 
the standard ordered basis for R?. Find the matrix of this inner product relative 
to &. 


11. Show that the formula 
a,b, 
LIS bart) = > H 
G aang bat) = 2 


defines an inner product on the space R[x] of polynomials over the field R. Let W 
be the subspace of polynomials of degree less than or equal to n. Restrict the above 
inner product to W, and find the matrix of this inner product on W, relative to the 
ordered basis {1, z, z?, . . . , x”}. (Hint: To show that the formula defines an inner 
product, observe that 


(fla) = f AOO at 


and work with the integral.) 


12. Let V be a finite-dimensional vector space and let ® = {a,...,@n} be a 
basis for V. Let ( | ) be an inner product on V. If cı, .. . , Cn are any n scalars, 
show that there is exactly one vector æ in V such that (ala;) = ep j = 1,..., n. 


13. Let V be a complex vector space. A function J from V into V is called a 
conjugation if J(a + 8) = J(a) + J(8), J(ca) = EJ(a), and J(J(a)) = a, for 
all scalars c and all a, @ in V. If J is a conjugation show that: 

(a) The set W of all ain V such that Ja = æ is a vector space over R with 
respect to the operations defined in V. 

(b) For each æ in V there exist unique vectors 8, y in W such thata = 8 + ty. 


14. Let V be a complex vector space and W a subset of V with the following 
properties: 
(a) W is a real vector space with respect to the operations defined in V. 
(b) For each @ in V there exist unique vectors 8, y in W such that æ = 6 + ty. 


Show that the equation Jæ = 8 — iy defines a conjugation on V such that Ja = a 
if and only if œ belongs to W, and show also that J is the only conjugation on V 
with this property. 


15. Find all conjugations on C? and C?. 


16. Let W be a finite-dimensional real subspace of a complex vector space V. 
Show that W satisfies condition (b) of Exercise 14 if and only if every basis of W 
is also a basis of V. 


Sec. 8.2 Inner Product Spaces 


17. Let V be a complex vector space, J a conjugation on V, W the set of a in V 
such that Ja = a, and f an inner product on W. Show that: 

(a) There is a unique inner product g on V such that g(a, 8) = f(a, 8) for 
alla, 8 in W, 

(b) g(Ja, JB) = g(8, æ) for all a, 8 in V. 


What does part (a) say about the relation between the standard inner products 
on R! and C?, or on R” and C*? 
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Now that we have some idea of what an inner product is, we shall 
turn our attention to what can be said about the combination of a vector 
space and some particular inner product on it. Specifically, we shall 
establish the basic properties of the concepts of ‘length’ and ‘orthogo- 
nality’ which are imposed on the space by the inner product. 


Definition. An inner product space ts a real or complex vector space, 
together with a specified inner product on that space, 


A finite-dimensional real inner product space is often called a Euclid- 
ean space. A complex inner product space is often referred to as a unitary 
space. 


Theorem 1. If V is an inner product space, then for any vectors a, B 
in V and any scalar c 


(i) [lea|| = |e] Ilall; 
(ii) [lall > 0 for a = 0; 
Gii) [(@l8)| < |leæll (|B; 
(iv) Ile + 6l] < Ila] + lial]. 
Proof. Statements (i) and (ii) follow almost immediately from 
the various definitions involved. The inequality in (iii) is clearly valid 
when a = 0. If a = 0, put 


Then (yla) = 0 and 








»—_ (g Ble) |g — (la) 
0 < |[y\l? = (« j[all? B A a) 
= (ap) — EOD 


[(al) |? 
“Tied? 


a||2 


= lel? — 
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Hence |(a|8)|? < |la||?(|8||2. Now using (c) we find that 


Ile + All? = Ilall? + (l6) + Ble) + lell? 
= |laļ|? + 2 Re (alB) + |16|/? 
< llall? + 2 [lal [181] + Illl? 
= (llel] + |l6ll)?. 


Thus, |læ + l| < llell + Illl. E 


The inequality in (iii) is called the Cauchy-Schwarz inequality. 
It has a wide variety of applications. The proof shows that if (for example) 
a is non-zero, then |(a|8)| < |læll I|B|| unless 


_ (la) 
Be ale 





Thus, equality occurs in (iii) if and only if a and £ are linearly dependent. 


Examp.e 7. If we apply the Cauchy-Schwarz inequality to the 
inner products given in Examples 1, 2, 3, and 5, we obtain the following: 


(a) |Z re) < È |e) E [y 
(b) leyi = T41 — Ty + 4T242| 

< ((£1 — a2)? + 3x2) "?( (yı — yo)? + By)!” 
(e) ltr (AB*)| < (tr (AA*))”?(tr (BB*))!? 


@) [D az| < (ff eas)” (low? az)” 


Definitions. Let a and 6 be vectors in an inner product space V. Then a 
is orthogonal to 8 if (a|8) = 0; since this implies B is orthogonal to a, 
we often simply say that a and B are orthogonal. If S is a set of vectors in V, 
S ts called an orthogonal set provided all pairs of distinct vectors in S are 
orthogonal. An orthonormal set is an orthogonal set S with the additional 
property that ||a|| = 1 for every a in S. 


The zero vector is orthogonal to every vector in V and is the only 
vector with this property. It is appropriate to think of an orthonormal 
set as a set of mutually perpendicular vectors, each having length 1. 


ExampLe 8. The standard basis of either R” or C” is an orthonormal 
set with respect to the standard inner product. 


EXAMPLE 9. The vector (x, y) in R? is orthogonal to (—y, x) with 
respect to the standard inner product, for 


((z, y)|(—y, z)) = —ay + yz = 0. 


However, if R? is equipped with the inner product of Example 2, then 
(x, y) and (—y, x) are orthogonal if and only if 
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y = 4(-3 + V13)z. 


Examp Le 10. Let V be C'**", the space of complex n X n matrices, 
and let Æ”: be the matrix whose only non-zero entry is a 1 in row p and 
column q. Then the set of all such matrices £”! is orthonormal with respect 
to the inner product given in Example 3. For 


(E)E) = tr (EE) = ôy tr (EP) = Bgadpr. 


EXAMPLE 11. Let V be the space of continuous complex-valued (or 
real-valued) functions on the interval 0 < x < 1 with the inner product 


(fla) = h NOTE a. 


Suppose f,(z) = V2 cos2rnz and that gal) = V2 sin 2rnz. Then 
{1, fi, 91, fe, go, .- -} is an infinite orthonormal set. In the complex case, 
we may also form the linear combinations 


1 : 
a in + id), n=1,2,.... 


In this way we get a new orthonormal set S which consists of all functions 
of the form 
h,(a) = et, n = +1, +2,.... 


The set S’ obtained from S by adjoining the constant function 1 is also 
orthonormal. We assume here that the reader is familiar with the calcula- 
tion of the integrals in question. 


The orthonormal sets given in the examples above are all linearly 
independent. We show now that this is necessarily the case. 


Theorem 2. An orthogonal set of non-zero vectors is linearly inde- 
pendent. 


Proof. Let S be a finite or infinite orthogonal set of non-zero 
vectors in a given inner product space. Suppose a1, a2, . >., @m are distinct 
vectors in S and that 


B = cia + Ca F eee + Cnam 
Then 
(Blar) = (ÈZ e;a;|ax) 
3 


È claar) 
I 


tl 


celaro) y 


Since (arlar) ¥ 0, it follows that 
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Thus when £ = 0, each c = 0; so S is an independent set. J 


Corollary. If a vector B is a linear combination of an orthogonal 


sequence of non-zero vectors ai,..., Om, then B ts the particular linear 
combination 
a, 
E- E Heal 


This corollary follows from the proof of the theorem. There is another 
corollary which although obvious, should be mentioned. If {œ .. ., am} 
is an orthogonal set of non-zero vectors in a finite-dimensional inner 
product space V, then m < dim V. This says that the number of mutually 
orthogonal directions in V cannot exceed the algebraically defined dimen- 
sion of V. The maximum number of mutually orthogonal directions in V 
is what one would intuitively regard as the geometric dimension of V, 
and we have just seen that this is not greater than the algebraic dimension. 
The fact that these two dimensions are equal is a particular corollary of 
the next result. 


Theorem 3. Let V be an inner product space and let Bı, ..., Ba be 
any independent vectors in V. Then one may construct orthogonal vectors 
@1,...,@, Tn V such that for each k = 1, 2,...,n the set 


{ai,..., ax} 
ts a basis for the subspace spanned by fi, . . . , Br- 


Proof. The vectors a1, ..., an will be obtained by means of a 
construction known as the Gram-Schmidt orthogonalization process. 
First let a: = 8; The other vectors are then given inductively as follows: 


Suppose aj, ..., am (1 < m < n) have been chosen so that for every k 

{on,..., ax}; l<k<m 
is an orthogonal basis for the subspace of V that is spanned by f,..., Bx 
To construct the next vector &m+, let 

 Bmialon) 
8-9 mt = Bmp — TT Tg Oke 
(8-9) Aml Pmi 2 ‘ IAE Ok 
Then am+ı # 0. For otherwise m+ is a linear combination of ay, ..., on 
and hence a linear combination of £1, . . . , Bm. Furthermore, if 1 < j < m, 
then 
Bm41|%%) 
(amas) = Ganla) — Z, Get (ala) 


= (Bamra) ~ esla 
= 0. 
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Therefore {a1,..., @m41} is an orthogonal set consisting of m + 1 non- 
zero vectors in the subspace spanned by (,..., mii. By Theorem 2, 
it is a basis for this subspace. Thus the vectors a,..., a, may be con- 
structed one after the other in accordance with (8-9). In particular, when 
n = 4, we have 














Qa = By 
a = p, — loa) 
paea 
(10 Galen), _ (Slad , 
eeen 
pen ae (Balar) e (Balaz) are (Balas) 4 
+= ba — Talle * 7 lE T ede oF 


Corollary. Every finite-dimensional inner product space has an ortho- 
normal basis. 


Proof. Let V be a finite-dimensional inner product space and 
{br . . . , Bn} a basis for V. Apply the Gram-Schmidt process to construct 
an orthogonal basis {q,..., æn}. Then to obtain an orthonormal basis, 
simply replace each vector œ by az/|{ox||. i 


One of the main advantages which orthonormal bases have over 
arbitrary bases is that computations involving coordinates are simpler. 
To indicate in general terms why this is true, suppose that V is a finite- 
dimensional inner product space. Then, as in the last section, we may use 
Equation (8-5) to associate a matrix G with every ordered basis @ = 
{ax,..., a} of V. Using this matrix 


Gir = (ada;), 


we may compute inner products in terms of coordinates. If @ is an ortho- 
normal basis, then G is the identity matrix, and for any scalars x; and yx 


(È xja? yar) = E xj; 
j k J 


Thus in terms of an orthonormal basis, the inner product in V looks like 
the standard inner product in F”. 

Although it is of limited practical use for computations, it is inter- 
esting to note that the Gram-Schmidt process may also be used to test 
for linear dependence. For suppose (;,...,@n are linearly dependent 
vectors in an inner product space V. To exclude a trivial case, assume 
that 6, ~ 0. Let m be the largest integer for which bı, . . . , Bm are inde- 
pendent. Then 1 < m < n. Let as, ..., æm be the vectors obtained by 
applying the orthogonalization process to 6u .. ., Bm. Then the vector 
Oni given by (8-9) is necessarily 0. For æm4ı is in the subspace spanned 


281 


282 


Inner Product Spaces Chap. 8 


by a,...,m and orthogonal to each of these vectors; hence it is 0 by 
(8-8). Conversely, if a1,..., @m are different from 0 and om 1 = 0, then 
Bi, - - +, Bmp are linearly dependent. 


EXAMPLE 12. Consider the vectors 


bı = (3, 0, 4) 
Be = (—1, 0, 7) 
B3 = (2, 9, 11) 


in R?’ equipped with the standard inner product. Applying the Gram- 
Schmidt process to 61, 82, 83, we obtain the following vectors. 


a = (3, 0, 4) 
a = (-1,0, 7) — ROTI D & 0, 4) 





70, 7) Ta (3, 0, 4) 
0, 3) 





ay = (2,9, 1) — G9 WIG D) 9, 0, 4 


= pee 
=. (2, 9, 11) = 2(3, 0, 4) 7 =f; 0, 3) 
= (0, 9, 0). 


(—4, 0, 3) 


These vectors are evidently non-zero and mutually orthogonal. Hence 
{ai as, a3} is an orthogonal basis for R’. To express an arbitrary vector 
(£1, 2, £3) in R? as a linear combination of a, ae, a3 it is not necessary to 
solve any linear equations. For it suffices to use (8-8). Thus 

321 + 423 — 4x; + 3X3 


a + 


(ti 2 23) = “pe 25 


a: + Tas 
as is readily verified. In particular, 
(G, 2, 3) E 3 (3, 0, 4) oF $ (—4, 0, 3) + $ (0, 9, 0). 


To put this point in another way, what we have shown is the following: 
The basis {fı, fo, fa} of (R?)* which is dual to the basis {a1, a2, œ} is defined 
explicitly by the equations 


3 4 
Filtr V2, £3) = foe 

— 4r + 3x 
Jalti, La, 23) = e 


flt T2, z3) = a 
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and these equations may be written more generally in the form 


n _ (a1, 22, 23) |@5) 
Filey Ba a) = Tel 
Finally, note that from a, a2, a3 we get the orthonormal basis 


z (3,0, 4), š (—4,0, 3), (0,1,0). 


EXAMPLE 13. Let A = f d where a, b, c, and d are complex num- 


bers. Set 6: = (a, b), Be = (c, d), and suppose that 6: # 0. If we apply 
the orthogonalization process to 6), 62, using the standard inner product 
in C?, we obtain the following vectors: 


a, = (a, b) 
= _ (e, d)|(a, b)) 

a: = (c, d) Jaj? + bf (a, b) 
2 _ (ca + db) 
E (c, d) lal? + |b|? (a, b) 





B (2 — dba daa — <i) 
~ Mal? + B]? fal? + fol 


det A Fi i 
~ fale + To h A 


Now the general theory tells us that a. ~ 0 if and only if 6, bz are linearly 
independent. On the other hand, the formula for a, shows that this is the 
case if and only if det A # 0. 


In essence, the Gram-Schmidt process consists of repeated applica- 
tions of a basic geometric operation called orthogonal projection, and it 
is best understood from this point of view. The method of orthogonal 
projection also arises naturally in the solution of an important approxima- 
tion problem. 

Suppose W is a subspace of an inner product space V, and let 8 be 
an arbitrary vector in V. The problem is to find a best possible approxima- 
tion to 8 by vectors in W. This means we want to find a vector «æ for which 
||8 — a|| is as small as possible subject to the restriction that a should 
belong to W. Let us make our language precise. 

A best approximation to £ by vectorsin W isa vector ain W such that 


I8 — all < l8 — ll 


for every vector y in W. 

By looking at this problem in R? or in R?, one sees intuitively that a 
best approximation to 8 by vectors in W ought to be a vector a in W such 
that 8 — a is perpendicular (orthogonal) to W and that there ought to 
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be exactly one such a. These intuitive ideas are correct for finite-dimen- 
sional subspaces and for some, but not all, infinite-dimensional subspaces. 
Since the precise situation is too complicated to treat here, we shall prove 
only the following result. 


Theorem 4. Let W be a subspace of an inner product space V and 
let B be a vector in V. 


(i) The vector ain W is a best approximation to B by vectors in W if 
and only if 8 — a is orthogonal to every vector in W. 
(ii) If a best approximation to B by vectors in W exists, it is unique. 
(iii) If W is finite-dimensional and {a1,..., an} is any orthonormal 
basis for W, then the vector 





as the (unique) best approximation to B by vectors in W. 
Proof. First note that if y is any vector in V, then B — y = 
(8 = a) + (a J y), and 
lle — vll? = [l8 — all? + 2 Re (8 — ala — y) + |la — yll. 


Now suppose 8 — a is orthogonal to every vector in W, that y is in W 
and that y ¥ a. Then, since a — y is in W, it follows that 


lle — vll? = [18 — all? + lla — vll? 
> ||6 — all’. 


Conversely, suppose that ||8 — y|| > ||@ — a|| for every y in W. 
Then from the first equation above it follows that 


2 Re (8 ~ ala — y) + lle — vll? = 0 


for all y in W. Since every vector in W may be expressed in the form 
a — y with y in W, we see that 


2 Re (6 — alr) + IIr||? > 0 


for every r in W. In particular, if y isin W and y ¥ a, we may take 





y). 


Then the inequality reduces to the statement 


—?2 |(@ — ele — 7)? + |8 — ala — y)/? >0 
Ila — vll? le = yl? 5 
This holds if and only if (8 — ala — y) = 0. Therefore, 8 — a is orthog- 
onal to every vector in W. This completes the proof of the equivalence 
of the two conditions on «æ given in (i). The orthogonality condition is 
evidently satisfied by at most one vector in W, which proves (ii). 
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Now suppose that W is a finite-dimensional subspace of V. Then we 
know, as a corollary of Theorem 3, that W has an orthogonal basis. Let 
{a1,..., a} be any orthogonal basis for W and define w by (8-11). Then, 
by the computation in the proof of Theorem 3, 8 — a is orthogonal to 
each of the vectors a; (B — a is the vector obtained at the last stage when 
the orthogonalization process is applied to a,..., an, B). Thus 8 — a is 
orthogonal to every linear combination of ai, ..., an, 1.e., to every vector 
in W. If y isin W and y ¥ a, it follows that ||8 — y|| > ||6 — al|. There- 
fore, a is the best approximation to £ that lies in W. 


Definition. Let V be an inner product space and S any set of vectors 
in V. The orthogonal complement of S is the set S+ of all vectors in V 
which are orthogonal to every vector in S. 


The orthogonal complement of V is the zero subspace, and conversely 
{0}+ = V. If S is any subset of V, its orthogonal complement S+ (S perp) 
is always a subspace of V. For S is non-empty, since it contains 0; and 
whenever a and £ are in S+ and c is any scalar, 


(ca + Bly) 


1) 


c(aly) + (Bly) 
c0 +0 
=0 


1 


for every y in S, thus cœ + 8 also lies in S. In Theorem 4 the character- 
istic property of the vector a is that it is the only vector in W such that 
B — a belongs to W+. 


Definition. Whenever the vector œ in Theorem 4 exists it is called the 
orthogonal projection of 8 on W. If every vector in V has an orthogonal 
projection on W, the mapping that assigns to each vector in V its orthogonal 
projection on W is called the orthogonal projection of V on W. 


By Theorem 4, the orthogonal projection of an inner product space 
on a finite-dimensional subspace always exists. But Theorem 4 also implies 
the following result. 


Corollary. Let V be an inner product space, W a finite-dimensional 
subspace, and E the orthogonal projection of V on W. Then the mapping 
66 — E$ 
is the orthogonal projection of V on W+. 


Proof. Let 8 be an arbitrary vector in V. Then 8 — EB isin W+, 
and for any y in W+, B — y = EB + (B — EB — y). Since EB is in W 
and 8 — EB — y is in W+, it follows that 


285 


286 


Inner Product Spaces Chap. 8 


le — yll? = ||£||? + lle — £8 — vll? 
> ||8 — (8 — Ep)|l? 


with strict inequality when y + 8 — Eß. Therefore, 8 — EB is the best 
approximation to 8 by vectors in W+. J 


EXAMPLE 14. Give R? the standard inner product. Then the orthog- 
onal projection of (—10, 2,8) on the subspace W that is spanned by 
(8, 12, —1) is the vector 


_ ((—10, 2, 8)|(3, 12, 
A 94+ 14441 





=D) 3,12, -1) 


—14 
= [54 (3, 12, ~1). 


The orthogonal projection of R3 on W is the linear transformation Æ 
defined by 


(21, Va, £3) —> (atan) (3, 12, —1). 


The rank of E is clearly 1; hence its nullity is 2. On the other hand, 
E(x, T2, T3) = (0, 0, 0) 


if and only if 3x, + 122. — x3 = 0. This is the case if and only if (x1, £2, z3) 
is in W+. Therefore, W+ is the null space of E, and dim (W+) = 2. 
Computing 


3. 12a, — 
(a1, T2; 23) ae (mtie) (3, 12, —1) 


we see that the orthogonal projection of R? on W+ is the linear transforma- 
tion J — E that maps the vector (x1, £2, 23) onto the vector 
i (1452, — 3622 + 323, —3621 + 10r: + 1223, 3a + 12a, + 1582). 


The observations made in Example 14 generalize in the following 
fashion. 


Theorem 5. Let W be a finite-dimensional subspace of an inner product 
space V and let E be the orthogonal projection of V on W. Then E is an idem- 
potent linear transformation of V onto W, W+ is the null space of E, and 


V=WwWOwt. 


Proof. Let 8 be an arbitrary vector in V. Then E£ is the best 
approximation to £ that lies in W. In particular, #8 = 8 when £ is in W. 
Therefore, E(Eg) = EB for every 8 in V; that is, E is idempotent: E? = E. 
To prove that EF is a linear transformation, let a and 8 be any vectors in 
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V and c an arbitrary scalar. Then, by Theorem 4, a — Ea and 8 — EB 
are each orthogonal to every vector in W. Hence the vector 


cla — Ea) + (B — EB) = (ca + B) — (cEa + EB) 


also belongs to W+, Since cHa + EB is a vector in W, it follows from 
Theorem 4 that 

i E(ca + B) = cEa + EB. 
Of course, one may also prove the linearity of E by using (8-11). Again 
let 8 be any vector in V. Then £8 is the unique vector in W such that 
B — EB is in W+. Thus Eg = 0 when £ is in W+. Conversely, @ is in W+ 
when £8 = 0. Thus W+ is the null space of Æ. The equation 


B = EB +8 — Ep 
shows that V = W + W+; moreover, W N W+ = {0}. For if a is a 


vector in W N W+, then (ala) = 0. Therefore, a = 0, and V is the direct 
sum of W and W+. I 


Corollary. Under the conditions of the theorem, I — E is the orthogonal 
projection of V on W+. It is an idempotent linear transformation of V onto 
W+ with null space W. 


Proof. We have already seen that the mapping 8 — 8 — EB is 
the orthogonal projection of V on W+., Since E is a linear transformation, 
this projection on W~ is the linear transformation J — E. From its geo- 
metric properties one sees that J — E is an idempotent transformation 
of V onto W. This also follows from the computation 


(I -E\(I-£)=I-E-E+Ff 
=f, 


Moreover, (J — E)8 = 0 if and only if 8 = EB, and this is the case if and 
only if 8 is in W. Therefore W is the null space of I — E. I 


The Gram-Schmidt process may now be described geometrically in 


the following way. Given an inner product space V and vectors fi,..., Bn 
in V, let Pz (k > 1) be the orthogonal projection of V on the orthogonal 
complement of the subspace spanned by f1,..., 6-1, and set Pı = I. 
Then the vectors one obtains by applying the orthogonalization process 
to fi,...,8, are defined by the equations 

(8-12) or = Pw Loken 


Theorem 5 implies another result known as Bessel’s inequality. 


Corollary. Let {o1,..., @n} be an orthogonal set of non-zero vectors 
in an inner product space V. If B is any vector in V, then 


2 
TART 
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and equality holds if and only if 


_ aș (Blax) 
P = È Jed 
Proof. Let y = z [(Blax) /|lox||?] a Then 8 =y+ô where 


(yl) = 0. Hence 





ilele = Illl? + Ilall. 


It now suffices to prove that 


Irie = z Eea. 


r Ilall? 





This is straightforward computation in which one uses the fact that 
(alan) = Oforj #k. I 


In the special case in which {a,...,q@,} is an orthonormal set, 
Bessel’s inequality says that 
Z llel? < [lk 
The corollary also tells us in this case that @ is in the subspace spanned by 


a,..., a, if and only if 


B = Z (Blar) ax 
k 


or if and only if Bessel’s inequality is actually an equality. Of course, in 
the event that V is finite dimensional and {a1,...,a,} is an orthogonal 
basis for V, the above formula holds for every vector 8 in V. In other 
words, if {a1,...,a@,} is an orthonormal basis for V, the kth coordinate 
of b in the ordered basis {ay,..., æn} is (Blax). 


ExamPLE 15. We shall apply the last corollary to the orthogonal 
sets described in Example i1. We find that 


(a) i 2. | 1 HOla af < f, * IO dt 


(b) i 


(c) i (V2cos2at + V2sin 4nt)?dt=1+1 =2. 


n 
> cye2tikt 
k= —n 








2 n 
d= Ð lel? 
k 


= -n 


Exercises 


1. Consider R4 with the standard inner product. Let W be the subspace of 
R‘ consisting of all vectors which are orthogonal to both a = (1,0, —1, 1) and 
B = (2,3, —1, 2). Find a basis for W. 
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2. Apply the Gram-Schmidt process to the vectors 8; = (1,0, 1), 82 = (1, 0, —1), 
B; = (0, 3,4), to obtain an orthonormal basis for R? with the standard inner 
product. 


3. Consider C3, with the standard inner product. Find an orthonormal basis for 
the subspace spanned by @; = (1, 0, 7) and b: = (2, 1, 1 + ï). 
4. Let V be an inner product space. The distance between two vectors a and 8 
in V is defined by 
d(a, 8) = |la — Bll. 
Show that 
(a) d(a, 8) 2 0; 
(b) d(a, B) = 0 if and only if a = 8; 
(c) d(a, B) = d(B, a); 
(d) d(a, 8) < dla, y) + d(y, B). 
5. Let V be an inner product space, and let a, 8 be vectors in V. Show that 
a = ß if and only if (aly) = (Gly) for every y in V. 


6. Let W be the subspace of R? spanned by the vector (8, 4). Using the standard 
inner product, let Æ be the orthogonal projection of R? onto W. Find 
(a) a formula for E(x, z2); 
(b) the matrix of E in the standard ordered basis; 
(e) W+; 
(d) an orthonormal basis in which Æ is represented by the matrix 


[o o] 
0 0 
7. Let V be the inner product space consisting of R? and the inner product 
whose quadratic form is defined by 
I(t, m||? = (4, — a)? + 322. 


Let E be the orthogonal projection of V onto the subspace W spanned by the 
vector (3, 4). Now answer the four questions of Exercise 6. 


8. Find an inner product on R? such that (e1, &) = 2. 


9. Let V be the subspace of R[x] of polynomials of degree at most 3. Equip V 
with the inner product 


1 

(fla) = ff IOIO) at 
(a) Find the orthogonal complement of the subspace of scalar polynomials. 
(b) Apply the Gram-Schmidt process to the basis {1, x, x?, x3}, 


10. Let V be the vector space of all n X n matrices over C, with the inner product 
(A|B) = tr (AB*). Find the orthogonal complement of the subspace of diagonal 
matrices. 


11. Let V be a finite-dimensional inner product space, and let {au,...,@n} be 
an orthonormal basis for V. Show that for any vectors a, 8 in V 


(al6) = È, la). 
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12. Let W be a finite-dimensional subspace of an inner product space V, and let E 
be the orthogonal projection of V on W. Prove that (Fal@) = (a|E@) for all a, 8 
in V. 


13. Let S be a subset of an inner product space V. Show that (S+)* contains the 
subspace spanned by S. When V is finite-dimensional, show that (S+)" is the sub- 
space spanned by S. 


14. Let V be a finite-dimensional inner product space, and let @ = {a1,..., an} 
be an orthonormal basis for V. Let T be a linear operator on V and A the matrix 
of T in the ordered basis @. Prove that 


Ag = (Tala). 


15. Suppose V = W, ® W: and that fı and fy are inner products on W, and Wa, 
respectively. Show that there is a unique inner product f on V such that 

(a) W: = WŁ; 

(b) f(a, 8) = f(a, 8), when a, B are in Wp, k = 1, 2, 
16. Let V be an inner product space and W a finite-dimensional subspace of V. 
There are (in general) many projections which have W as their range. One of 
these, the orthogonal projection on W, has the property that ||Ea|| < ||a|| for 
every a in V. Prove that if E is a projection with range W, such that ||Ea|| < Ilall 
for all æ in V, then Æ is the orthogonal projection on W. 


17. Let V be the real inner product space consisting of the space of real-valued 
continuous functions on the interval, —1 < t < 1, with the inner product 


(fi) = f’ SOO di. 


Let W be the subspace of odd functions, i.e., functions satisfying f(—t) = —f(t). 
Find the orthogonal complement of W. 


8.3. Linear Functionals and Adjoints 


The first portion of this section treats linear functionals on an inner 
product space and their relation to the inner product. The basic result is 
that any linear functional f on a finite-dimensional inner product space 
is ‘inner product with a fixed vector in the space, i.e., that such an f has 
the form f(a) = (al8) for some fixed 6 in V. We use this result to prove 
the existence of the ‘adjoint’ of a linear operator T on V, this being a linear 
operator T* such that (Ta|8) = (a| T*8) for all æ and & in V. Through the 
use of an orthonormal basis, this adjoint operation on linear operators 
(passing from T to T*) is identified with the operation of forming the 
conjugate transpose of a matrix. We explore slightly the analogy between 
the adjoint operation and conjugation on complex numbers. 

Let V be any inner product space, and let @ be some fixed vector in V. 
We define a function fg from V into the scalar field by 
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fela) = (als). 
This function fg is a linear functional on V, because, by its very definition, 
(al8) is linear as a function of «æ. If V is finite-dimensional, every linear 
functional on V arises in this way from some 8. 


Theorem 6. Let V be a finite-dimensional inner product space, and f a 
linear functional on V. Then there exists a unique vector B in V such that 
f(a) = (alg) for all a in V. 

Proof. Let {a, @2,..., an} be an orthonormal basis for V. Put 
(8-13) B= 2 Fla)a; 
je 


and let fg be the linear functional defined by 
s(x) = (alb). 


Jola) = (al Z flai)a) = f(a). 


Then 


Since this is true for each az, it follows that f = fs. Now suppose y is a 
vector in V such that (al8) = (aly) for all æ. Then (8 — y|8 — y) =0 
and 8 = y. Thus there is exactly one vector 8 determining the linear func- 
tional f in the stated manner. ff 


The proof of this theorem can be reworded slightly, in terms of the 
representation of linear functionals in a basis. If we choose an ortho- 
normal basis {a;,...,@,} for V, the inner product of a = xia, + +--+ + 
Ind and B = yar + +++ + Yn, will be 

(alb) = aati +--+ + tan. 
If f is any linear functional on V, then f has the form 

f(a) = ety + +++ + Cnt 
for some fixed scalars c,...,¢n determined by the basis. Of course 
c; = f(a;). If we wish to find a vector £ in V such that (a|8) = f(a) for all a, 
then clearly the coordinates y; of 8 must satisfy 9; = c; or y; = f(a;). 
Accordingly, 

6 = Jaja + +++ + aJa 

is the desired vector. 

Some further comments are in order. The proof of Theorem 6 that 
we have given is admirably brief, but it fails to emphasize the essential 
geometric fact that 6 lies in the orthogonal complement of the null space 
of f. Let W be the null space of f. Then V = W + W+, and f is completely 


determined by its values on W~.. In fact, if P is the orthogonal projection 
of V on W+, then 
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f(a) = f(Pa) 


for all æ in V, Suppose f ¥ 0. Then f is of rank 1 and dim (W+) = 1. If y 
is any non-zero vector in W+, it follows that 





x = (aly) 
Pa = Tell? 
for all a in V. Thus 
f(a) = (aly) - is n 


for all æ, and B = [f(7)/|lvl|?] v. 


EXAMPLE 16. We should give one example showing that Theorem 6 
is not true without the assumption that V is finite dimensional. Let V be 
the vector space of polynomials over the field of complex numbers, with 
the inner product 


(fla) = ff OD dt. 


This inner product can also be defined algebraically. If f = È azt and 
g = È bex*, then 


Let z be a fixed complex ste and let L be the linear functional 
‘evaluation at 2’: 


L(f) = f). 


Is there a polynomial g such that (fig) = L(f) for every f? The answer is 
no; for suppose we have 


fe = [soa a 
for every f. Let h = x — z, so that for any f we have (hf)(z) = 0. Then 


1 cesses: 
= f; niofog® at 
for all f. In particular this holds when f = hg so that 


f; WOLKI a = 0 


and so hg = 0. Since h ~ 0, it must be that g = 0. But L is not the zero 
functional; hence, no such g exists. 

One can generalize the example somewhat, to the case where L is a 
linear combination of point evaluations. Suppose we select fixed complex 
numbers 21,..., Zn and scalars cı, . . . , €a and let 


Lf) = afla) Hee + nfen). 
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Then L is a linear functional ov. V, but there is no g with L(f) = (flg), 
unless & = C2 = -:: = ¢, = 0. Just repeat the above argument with 
h = (x — z1) +++ (£ — 2n). 

We turn now to the concept of the adjoint of a linear operator. 


Theorem 7. For any linear operator T on a finite-dimensional inner 
product space V, there exists a unique linear operator T* on V such that 


(8-14) (Talb) = (a|T*) 
for all a, Bin V. 


Proof. Let 8 be any vector in V. Then œ > (Tal8) is a linear 
functional on V. By Theorem 6 there is a unique vector $’ in V such that 
(Ta|8) = (al6’) for every a in V. Let T* denote the mapping B > 8’: 


B’ = T*B. 
We have (8-14), but we must verify that T* is a linear operator. Let £, y 
be in V and let c be a scalar. Then for any a, 


(alT*(c8 + »)) = (Tales + ¥) 
(Taje) + (Taly) 
= (Tal) + (Taly) 
= (a) T*B) + (a|T*y) 
(alcT*8) + (al T*y) 
= (a|cT*B + T*y). 

Thus 7*(c8 + y) = eT*8 + T*y and T* is linear. 

The uniqueness of 7* is clear. For any 6 in V, the vector T*ß is 
uniquely determined as the vector 8’ such that (Tal) = (al) for 
everya. I 


Theorem 8. Let V be a finite-dimensional inner product space and let 
G = {ai,..., an} be an (ordered) orthonormal basis for V. Let T be a 
linear operator on V and let A be the matrix of T in the ordered basis @. Then 
Az; = (Tajjar). 


Proof. Since @ is an orthonormal basis, we have 
a= E (alax)ar. 
k= 
The matrix A is defined by 
Ta; = È Axjor 
and since 
Taj = D (Ta;lox)ox 
k=l 


we have Ai; = (Taja. J 
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Corollary. Let V be a finite-dimensional inner product space, and let 
T be a linear operator on V. In any orthonormal basis for V, the matrix ef T* 
as the conjugate transpose of the matrix of T. 


Proof. Let @ = {a1,..., @,} be an orthonormal basis for V, let 
A = [T]g and B = [T*]g. According to Theorem 8, 
Ar; = (Tajlax) 
Bu; = (T*a;lan). 
By the definition of T* we then have 
Br; = (T*ejlax) 


EXAMPLE 17. Let V be a finite-dimensional inner product space and 
E the orthogonal projection of V on a subspace W. Then for any vectors 
aand in V. 
(Ealg) = (EolEB + (1 — E8) 
= (Ea|E) 
= (Ea + (1 — E)alEB) 
= (|). 


From the uniqueness of the operator E* it follows that E* = E. Now 
consider the projection E described in Example 14. Then 


l 9 36 3 
A = -=| 36 144 -12 
i —3 —12 1 


is the matrix of E in the standard orthonormal basis. Since E = E*, A is 
also the matrix of E*, and because A = A*, this does not contradict the 
preceding corollary. On the other hand, suppose 


a = (154, 0, 0) 


ag = (145, —36, 3) 
a; = (—36, 10, 12). 


Then {ai, a, œ} is a basis, and 
Ea, = (9, 36, —3) 
Ea = (0, 0, 0) 
Ea; = (0, 0,0). 


Since (9, 36, —3) = —(154, 0, 0) — (145, —36, 3), the matrix B of E in 
the basis {a1, a, a3} is defined by the equation 
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—1 0 0 
B=|]-1 0 OF 
0 0 0 


In this case B = B*, and B* is not the matrix of E* = E in the basis 
{oy, a, a3}. Applying the corollary, we conclude that {a, œz, a3} is not 
an orthonormal basis. Of course this is quite obvious anyway. 


Definition. Let T be a linear operator on an inner product space V. 
Then we say that T has an adjoint on V if there exists a linear operator T* 
on V such that (Tal|8) = (a|T*8) for all aand Bin V. 


By Theorem 7 every linear operator on a finite-dimensional inner 
product space V has an adjoint on V. In the infinite-dimensional case this 
is not always true. But in any case there is at most one such operator 7'*; 
when it exists, we callit the adjoint of T. 

Two comments should be made about the finite-dimensional case. 


1. The adjoint of 7’ depends not only on T but on the inner product 
as well. 

2. As shown by Example 17, in an arbitrary ordered basis @, the 
relation between [T]e and [T*]ę is more complicated than that given in 
the corollary above. 


EXxampLeE 18. Let V be C**}, the space of complex n X 1 matrices, 
with inner product (X|Y) = Y*X. If A is an n X n matrix with complex 
entries, the adjoint of the linear operator X — AX is the operator 
X —+ A*X. For 


(AX|Y) = Y*AX = (A*Y)*X = (X|A*Y). 


The reader should convince himself that this is really a special case of the 
last corollary. 


ExamPrLE 19. This is similar to Example 18. Let V be C*** with the 
inner product (A|B) = tr (B*A). Let M be a fixed n X n matrix over C. 
The adjoint of left multiplication by M is left multiplication by M*. Of 
course, ‘left multiplication by M’ is the linear operator Ly defined by 
Ly(A) = MA. 


(Lu(A)|B) = tr (B*(MA)) 
= tr (MAB*) 
tr (AB*M) 

= tr (A(M*B)*) 
= (A|Lar*(B)). 
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Thus (Lm)* = Lus. In the computation above, we twice used the char- 
acteristic property of the trace function: tr (AB) = tr (BA). 


ExamPLE 20. Let V be the space of polynomials over the field of 
complex numbers, with the inner product 


(flo) = [I OTD at. 


If f is a polynomial, f = È azt, we let f = È dyx*. That is, f is the poly- 
nomial whose associated polynomial function is the complex conjugate 
of that for f: 


f(t) =f, t real 
Consider the operator ‘multiplication by f, that is, the linear operator 


M; defined by M #9) = fg. Then this operator has an adjoint, namely, 
multiplication by f. For 


(M,(g)|h) = (falh) 
= [SO g HMO at 


= f; OFOD] at 
= (g|fh) 
= (g|Mj(h)) 

and so (Mj)* = M;. 


EXampLeE 21. In Example 20, we saw that some linear operators on 
an infinite-dimensional inner product space do have an adjoint. As we 
commented earlier, some do not. Let V be the inner product space of 
Example 21, and let D be the differentiation operator on C[x]. Integra- 
tion by parts shows that 


(Dflg) = f(1)g(1) — £(0)g(0) — (f|Dg). 
Let us fix g and inquire when there is a polynomial D*g such that 
(Dflg) = (f|D*g) for all f. If such a D*g exists, we shall have 


(fID*9) = Fg) — F()g(0) — (f1Dg) 


or 


(fID*g + Dg) = f1)gQ) — f(0)g(0). 


With g fixed, L(f) = f(1)g(1) — f(0)g(0) is a linear functional of the type 
considered in Example 16 and cannot be of the form L(f) = (flh) unless 
L = 0. If D*g exists, then with h = D*g + Dg we do have L(f) = (flh), 
and sog(0) = g(1) = 0. The existence of a suitable polynomial D*g implies 
g(0) = g(1) = 0. Conversely, if g(0) = g(1) = 0, the polynomial D*g = 
—Dg satisfies (Df|g) = (f|D*g) for all f. If we choose any g for which 
g(0) = 0 or g(1) = 0, we cannot suitably define D*g, and so we conclude 
that D has no adjoint. 
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We hope that these examples enhance the reader’s understanding of 
the adjoint of a linear operator. We see that the adjoint operation, passing 
from T to T*, behaves somewhat like conjugation on complex numbers. 
The following theorem strengthens the analogy. 


Theorem 9. Let V be a finite-dimensional inner product space. If T 
and U are linear operators on V and c is a scalar, 


(i) (T + U)* = T* + U*; 
(ii) (eT)* = eT*; 
(ii) (TU) = U*T*; 
(iv) (T*)* = T. 


Proof. To prove (i), let a and 6 be any vectors in V. 
Then 
((T + U)al8) = (Ta + Vals) 
= (Tals) + (Vals) 
= (a|T*8) + (a|U*B) 
= (a| T*B + U*B) 
= (a|(T* + U*)6). 


From the uniqueness of the adjoint we have (T + U)* = T* + U*. We 
leave the proof of (ii) to the reader. We obtain (iii) and (iv) from the 
relations 


(TUals) = (Ua|T*B) = (al U*T*6) 
(T*a|8) = (6|T*a) = (TBla) = (alT8). U 


Theorem 9 is often phrased as follows: The mapping T — T* is a 
conjugate-linear anti-isomorphism of period 2. The analogy with complex 
conjugation which we mentioned above is, of course, based upon the 
observation that complex conjugation has the properties (a + 22) = 
Z + 22, (2122) = ZZ2, 2 = z. One must be careful to observe the reversal 
of order in a product, which the adjoint operation imposes: (UT)* = 
T*U*. We shall mention extensions of this analogy as we continue our 
study of linear operators on an inner product space. We might mention 
something along these lines now. A complex number z is real if and only 
if z = Z. One might expect that the linear operators T such that T = T* 
behave in some way like the real numbers. This is in fact the case. For 
example, if T is a linear operator on a finite-dimensional complex inner 
product space, then 


(8-15) T = V + iU 


where U; = Uj and U, = U3. Thus, in some sense, T has a ‘real part’ and 
an ‘imaginary part.’ The operators U; and U; satisfying U; = U}, and 
U: = U3, and (8-15) are unique, and are given by 
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1 
U = 5 (T + T*) 
U, = (T — 7%) 
? 2 i 


A linear operator T such that T = T* is called self-adjoint (or 
Hermitian). If @ is an orthonormal basis for V, then 


[T*]e = [T] 


and so T is self-adjoint if and only if its matrix in every orthonormal basis 
is a self-adjoint matrix. Self-adjoint operators are important, not simply 
because they provide us with some sort of real and imaginary part for the 
general linear operator, but for the following reasons: (1) Self-adjoint 
operators have many special properties. For example, for such an operator 
there is an orthonormal basis of characteristic vectors. (2) Many operators 
which arise in practice are self-adjoint. We shall consider the special 
properties of self-adjoint operators later. 


Exercises 


1. Let V be the space C?, with the standard inner product. Let T be the linear 
operator defined by Te = (1, —2), Te = (i, ~1). If œ = (21, %2), find T*a. 


2. Let T be the linear operator on C? defined by Te = (1 + i, 2), Te = (i, îi). 
Using the standard inner product, find the matrix of T* in the standard ordered 
basis. Does T commute with T*? 


3. Let V be C’ with the standard inner product. Let T be the linear operator on 
V whose matrix in the standard ordered basis is defined by 


Aik = titk, (i? = —1). 
Find a basis for the null space of T*. 


4. Let V be a finite-dimensional inner product space and T a linear operator on V. 
Show that the range of T* is the orthogonal complement of the null space of T. 


5. Let V be a finite-dimensional inner product space and T a linear operator on V. 
If T is invertible, show that T* is invertible and (T*)“! = (T-!)*. 


6. Let V be an inner product space and 8, y fixed vectors in V. Show that 
Ta = (al|B)y defines a linear operator on V. Show that T has an adjoint, and 
describe T* explicitly. 
Now suppose V is C” with the standard inner product, B = (y1,..., Yn), and 
Y = (£u... , a). What is the j, k entry of the matrix of T in the standard ordered 
basis? What is the rank of this matrix? 


7. Show that the product of two self-adjoint operators is self-adjoint if and only 
if the two operators commute. 
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8.- Let V be the vector space of the polynomials over R of degree less than or 
equal to 3, with the inner product 


(fg) = Í T OÀ) dt. 


If ¢ is a real number, find the polynomial g, in V such that (flg) = f(t) for allf in V. 


9, Let V be the inner product space of Exercise 8, and let D be the differentiation 
operator on V. Find D*. 


10. Let V be the space of n X n matrices over the complex numbers, with the 
inner product (A, B) = tr (AB*). Let P be a fixed invertible matrix in V, and 
let Tp be the linear operator en V defined by Tp(A) = P-!AP. Find the adjoint 
of Tp. 


11. Let V be a finite-dimensional inner product space, and let © be an idempotent 
linear operator on V, ie, E? = E. Prove that E is self-adjoint if and only if 
EE* = B*E. 


12. Let V be a finite-dimensional complex inner product space, and let T be a 
linear operator on V. Prove that T is self-adjoint if and only if (Tala) is real for 
every a in V. 
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In this section, we consider the concept of an isomorphism between 
two inner product spaces. If V and W are vector spaces, an isomorphism 
of V onto W is a one-one linear transformation from V onto W, i.e., a 
one-one correspondence between the elements of V and those of W, which 
‘preserves’ the vector space operations. Now an inner product space con- 
sists of a vector space and a specified inner product on that space. Thus, 
when V and W are inner product spaces, we shall require an isomorphism 
from V onto W not only to preserve the linear operations, but also to 
preserve inner products. An isomorphism of an inner product space onto 
itself is called a ‘unitary operator’ on that space. We shall consider various 
examples of unitary operators and establish their basic properties. 


Definition. Let V and W be inner product spaces over the same field, 
and let T be a linear transformation from V into W. We say that T pre- 
serves inner products if (TaT) = (al8) for all a, B in V. An iso- 
morphism of V onto W is a vector space isomorphism T of V onto W which 
also preserves inner products. 


If T preserves inner products, then ||Ta]| = ||a|| and so T is neces- 
sarily non-singular. Thus an isomorphism from V onto W can also be 
defined as a linear transformation from V onto W which preserves inner 
products. If T is an isomorphism of V onto W, then T-! is an isomorphism 
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of W onto V; hence, when such a T exists, we shall simply say V and W 
are isomorphic. Of course, isomorphism of inner product spaces is an 
equivalence relation. 


Theorem 10. Let V and W be finite-dimensional inner product spaces 
over the same field, having the same dimension. If T is a linear transformation 
from V into W, the following are equivalent. 


(i) T preserves inner products. 
(ii) T is an (inner product space) isomorphism. 
(iii) T carries every orthonormal basis for V onto an orthonormal basis 
for W. 


(iv) T carries some orthonormal basis for V onto an orthonormal basis 
for W. 


Proof. (i) > (ii) If T preserves inner products, then || Ta|| = [lall 
for all ain V. Thus T is non-singular, and since dim V = dim W, we know 
that T is a vector space isomorphism. 

(ii) > (iii) Suppose T is an isomorphism. Let {a1,..., an} be an 
orthonormal basis for V. Since T is a vector space isomorphism and 
dim W = dim J, it follows that {Ta,..., Tan} is a basis for W. Since T 
also preserves inner products, (Ta;|Tax) = (ajlox) = ôjr. 

(iii) — (iv) This requires no comment. 

(iv) > (i) Let {ai,..., @,} be an orthonormal basis for V such that 
{Ta,..., Tan} is an orthonormal basis for W. Then 


(Ta;|Tox) = (a,lox) = Sx. 


For any a = xa, + +++ + tnan and B = yai + +++ + Yran in V, we have 
(alb) = È 2G; 
j=l 
(Ta|TB) = (È z;Ta;l Z yT) 
J 


= 2 2 £ie(T a;l Tox) 
J 


n 
= 2 29; 
j=l 
and so T preserves inner products. 


Corollary. Let V and W be finite-dimenstonal inner product spaces 
over the same field. Then V and W are isomorphic if and only if they have 
the same dimension. 


Proof. If {œ,..., @n} is an orthonormal basis for V and 
{8x ...,8,\ is an orthonormal basis for W, let T be the linear transfor- 
mation from V into W defined by Ta; = @;. Then T is an isomorphism of 
V onto W. § 
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EXxamp.Le 22. If V is an n-dimensional inner product space, then each 
ordered orthonormal basis ® = {m,...,a,} determines an isomorphism 
of V onto F” with the standard inner product. The isomorphism is simply 

T (rien tee + Enan) z (x1, Eri En). 
There is the superficially different isomorphism which @ determines of V 
onto the space F”x! with (X|Y) = Y*X as inner product. The isomor- 
phism is 
a > [ale 


i.e., the transformation sending @ into its coordinate matrix in the ordered 
basis ®. For any ordered basis ®, this is a vector space isomorphism; 
however, it is an isomorphism of the two inner product spaces if and only 
if @ is orthonormal. 


EXxamp_e 23. Here is a slightly less superficial isomorphism. Let W 
be the space of all 3 X 3 matrices A over R which are skew-symmetric, 
i.e, At = — A. We equip W with the inner product (A|B) = 3 tr (AB), 
the 3 being put in as a matter of convenience. Let V be the space R? with 
the standard inner product. Let T be the linear transformation from V 
into W defined by 

0 — T3 T2 
T(x, T2; T3) = T3 0 ti pf 


Then T maps V onto W, and putting 


0 =r £o 0 —ys Y2 
A= T3 0 tı V B = Y3 0 =y 
— T2 Tı 0 —Y2 Yı 0 


we have 
tr (AB) = tya + TY + Ys + oye + Ys 
= 2(ay. + T2Y2 + x33). 


Thus (a|6) = (Ta| TA) and T is a vector space isomorphism. Note that T 
carries the standard basis {e1, e2, &} onto the orthonormal basis consisting 
of the three matrices 


0 0 0 0 0 1 0 -1 0 
00 —-1} 0 0 OF 1 0 O}; 
0 1 0 —1 0 0 0 0 0 


ExamrLE 24. It is not always particularly convenient to describe an 
isomorphism in terms of orthonormal bases. For example, suppose G = P*P 
where P is an invertible n X n matrix with complex entries. Let V be the 
space of complex n X 1 matrices, with the inner product [X|Y] = Y*GX. 
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Let W be the same vector space, with the standard inner product (X|Y) = 
Y*X. We know that V and W are isomorphic inner product spaces. It 
would seem that the most convenient way to describe an isomorphism 
between V and W is the following: Let T be the linear transformation 
from V into W defined by T(X) = PX. Then 


(TX|TY) = (PX|PY) 
(PY)*(PX) 
Y*P*PX 
Y*GX 
[XIF]. 


l 


i 


Hence T is an isomorphism. 


EXAMPLE 25. Let V be the space of all continuous real-valued func- 
tions on the unit interval, 0 < ¢ < 1, with the inner product 


[fla] = fy IOIO at. 


Let W be the same vector space with the inner product 


1 
(se) = f; FOID dt. 
Let T be the linear transformation from V into W given by 
(TAO = H(t). 


Then (Tf|Tg) = [flg], and so T preserves inner products; however, T is 
not an isomorphism of V onto W, because the range of T is not all of W. 
Of course, this happens because the underlying vector space is not finite- 
dimensional. 


Theorem 11. Let V and W be inner product spaces over the same field, 
and let T be a linear transformation from V into W. Then T preserves inner 
products if and only if ||Ta|| = ||| for every a in V. 


Proof. If T preserves inner products, T ‘preserves norms.’ Sup- 
pose ||Ta|| = ||al| for every a in V. Then ||Tal|? = |jal|?. Now using the 
appropriate polarization identity, (8-3) or (8-4), and the fact that T is 
linear, one easily obtains (a|8) = (Ta|T8) for alla, Bin V. f 


Definition. A unitary operator on an inner product space is an iso- 
morphism of the space onto itself. 


The product of two unitary operators is unitary. For, if U, and U: 
are unitary, then UU; is invertible and ||U2U1a|| = ||Uial| = Ilall for 
each a. Also, the inverse of a unitary operator is unitary, since ||Ual| = 
[lall says ||U-48|| = ||6||, where 8 = Ua. Since the identity operator is 
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clearly unitary, we see that the set of all unitary operators on an inner 
product space is a group, under the operation of composition. 

If V is a finite-dimensional inner product space and U is a linear 
operator on V, Theorem 10 tells us that U is unitary if and only if 
(Ual|UB) = (alb) for each a, 6 in V; or, if and only if for some (every) 
orthonormal basis {a,...,a,} it is true that {Um,...,Uo,} is an 
orthonormal basis. 


Theorem 12. Let U be a linear operator on an inner prod‘'*t space V. 
Then U is unitary if and only if the adjoint U* of U exists and UU* = 
U*U = I. 

Proof. Suppose U is unitary. Then U is invertible and 
(Ualg) = (Ua|UUB) = (al U$) 


for all a, 8. Hence U~! is the adjoint of U. 

Conversely, suppose U* exists and UU* = U*U = I. Then U is 
invertible, with U~! = U*, So, we need only show that U preserves inner 
products. We have 


(Ual|UB) = (a|U*UB) 
(alB) 
= (al) 


foralla, 6. i 


EXAMPLE 26. Consider C™™! with the inner product (X| Y) = Y*X. 
Let A be an n X n matrix over C, and let U be the linear operator defined 
by U(X) = AX. Then 


(UX|UY) = (AX|AY) = Y*A*AX 
for all X, Y. Hence, U is unitary if and only if A*A = J. 


Definition. A complex n X n matrix A is called unitary, if A*A = I. 


Theorem 13. Let V be a finite-dimensional inner product space and 
let U be a linear operator on V. Then U is unitary if and only if the matrix 
of U in some (or every) ordered orthonormal basis is a unitary matriz. 


Proof. At this point, this is not much of a theorem, and we state 
it largely for emphasis. If @ = {a,...,a,} is an ordered orthonormal 
basis for V and A is the matrix of U relative to ®, then A*A = I if and 
only if U*U = I. The result now follows from Theorem 12. f 


Let A be an n X n matrix. The statement that A is unitary simply 
means 


(A*A) jx = Ox 
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or 
TES eens 
È Arr = dye 
rol 


In other words, it means that the columns of A form an orthonormal set 
of column matrices, with respect to the standard inner product (X|Y) = 
Y*X. Since A*A = J if and only if AA* = J, we see that A is unitary 
exactly when the rows of A comprise an orthonormal set of n-tuples in Ca 
(with the standard inner product). So, using standard inner products, 
A is unitary if and only if the rows and columns of A are orthonormal sets. 
One sees here an example of the power of the theorem which states that a 
one-sided inverse for a matrix is a two-sided inverse. Applying this theorem 
as we did above, say to real matrices, we have the following: Suppose we 
have a square array of real numbers such that the sum of the squares of 
the entries in each row is 1 and distinct rows are orthogonal. Then the 
sum of the squares of the entries in each column is 1 and distinct columns 
are orthogonal. Write down the proof of this for a 3 X 3 array, without 
using any knowledge of matrices, and you should be reasonably impressed. 


Definition. A real or complex n X n matrix A is said to be orthogo- 
nal, if AtA = I. 


A real orthogonal matrix is unitary; and, a unitary matrix is 
orthogonal if and only if each of its entries is real. 


EXAMPLE 27. We give some examples of unitary and orthogonal 
matrices. 


(a) A 1X 1 matrix [c] is orthogonal if and only if c = +1, and 
unitary if and only if čc = 1. The latter condition means (of course) that 
lel = 1, or c = e”, where 9 is real. 


(b) Let 
a b 
i f al 
Then A is orthogonal if and only if 


Pn E Ei 


=C a 





The determinant of any orthogonal matrix is easily seen to be +1. Thus 
A is orthogonal if and only if 


oe 


or 


Sec. 8.4 Unitary Operators 


where a? + b? = 1. The two cases are distinguished by the value of det A. 
(c) The well-known relations between the trigonometric functions 


show that the matrix 
cos@ —sin 0 
ApS be 8 cos A 


is orthogonal. If 6 is a real number, then A, is the matrix in the standard 
ordered basis for R? of the linear operator Us, rotation through the angle 6. 
The statement that A» is a real orthogonal matrix (hence unitary) simply 
means that U, is a unitary operator, i.e., preserves dot products. 


(d) Let 
T 


Then A is unitary if and only if 


Page ee al 

b d] ad—beL~e a 

The determinant of a unitary matrix has absolute value 1, and is thus a 
complex number of the form e”, 0 real. Thus A is unitary if and only if 


E aio Ea 
~ Le esaj LO eæjL—b a 


where 0 is a real number, and a,b are complex numbers such that 
lal? + |b)? = 1. 





As noted earlier, the unitary operators on an inner product space 
form a group. From this and Theorem 13 it follows that the set U(n) of 
all n X n unitary matrices is also a group. Thus the inverse of a unitary 
matrix and the product of two unitary matrices are again unitary. Of 
course this is easy to see directly. Ann X n matrix A with complex entries 
is unitary if and only if A~ = A*. Thus, if A is unitary, we have (A~})~! = 
A = (A*)-! = (A7)*, If A and B are n Xn unitary matrices, then 
(AB)-! = B-14—! = B*A* = (AB)*. 

The Gram-Schmidt process in C” has an interesting corollary for 
matrices that involves the group U(n). 


Theorem 14. For every invertible complex n X n matrix B there exists 
a unique lower-triangular matrix M with positive entries on the main diagonal 
such that MB is unitary. 


Proof. The rows bı, . . . , 8, of B form a basis for C”. Let a1,..., &n 
be the vectors obtained from ,...,8n by the Gram-Schmidt process. 
Then, forl < k < n, {a1,..., a} is an orthogonal basis for the subspace 
spanned by {61,..., 6}, and 
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ar = Be — x (Bilaj) 


i Ilole 


Hence, for each k there exist unique scalars Cr; such that 


= Br — È Crib; 
j <k 





Let U be the unitary matrix with rows 


ay An 
loll’ "°°? [leall 


and M the matrix defined by 


1 PO 
[rrer ifj <k 
Mr = 





1 sok 
Jee!’ if j =f 





0, ifg>k. 


Then M is lower-triangular, in the sense that its entries above the main 
diagonal are 0. The entries M,, of M on the main diagonal are all > 0, and 


Teall = 2 Mrißi 1 < k < N. 


Now these equations simply say that 
U = MB. 


To prove the uniqueness of M, let T+(n) denote the set of all complex 
n X n lower-triangular matrices with positive entries on the main diagonal. 
Suppose Mı and M, are elements of T+(n) such that M;B is in U(n) for 
i = 1, 2. Then because U(n) is a group 


(MıB)(M:B)-* = M,M;' 


lies in U(n). On the other hand, although it is not entirely obvious, T+(n) 
is also a group under matrix multiplication. One way to see this is to con- 
sider the geometric properties of the linear transformations 


X => MX, (M in T+(n)) 


on the space of column matrices. Thus M3 +, M@,Mz', and (M,\Mz')-} are 
allin T+(n). But, since M,Mz! is in U(n), (MıMz7+)-! = (M,Mz')*. The 
transpose or conjugate transpose of any lower-triangular matrix is an 
upper-triangular matrix. Therefore, M,Mz' is simultaneously upper- 
and lower-triangular, i.e., diagonal. A diagonal matrix is unitary if and 
only if each of its entries on the main diagonal has absolute value 1; if the 
diagonal entries are all positive, they must equal 1. Hence MiMz' = I 
andM,=M, J 
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. Let GL(n) denote the set of all invertible complex n X n matrices. 
Then GL(n) is also a group under matrix multiplication. This group is 
called the general linear group. Theorem 14 is equivalent to the fol- 
lowing result. 


Corollary. For each B in GL(n) there exist unique matrices N and U 
such that N is in T+(n), U is in U(n), and 
B=N.-U. 

Proof. By the theorem there is a unique matrix M in T*(n) such 
that MB is in U(n). Let MB = U and N = M-!. Then N is in T+(n) and 
B = N . U. On the other hand, if we are given any elements N and U 
such that N is in T+(n), U is in U(n), and B =N . U, then N7!B is in 
U(n) and N-! is the unique matrix M which is characterized by the 
theorem; furthermore U is necessarily NB. f 


EXAMPLE 28. Let zı and qz be real numbers such that z? + 23 = 1 


and zı # 0. Let 
t xt 0 
B=|0 1 OF} 
0 0 1 


Applying the Gram-Schmidt process to the rows of B, we obtain the 
vectors 
Q = (x1, Tz, 0) 


a= (0, l, 0) = TTi Ta, 0) 
= zı(— t2, Tı, 0) 
as = (0,0, 1). 
Let U be the matrix with rows a1, (a2/x1), a3. Then U is unitary, and 
ti La 0 1 0 0 Tı Le 0 
al oe ey Ole = 2 0 1 0 
wy Vy 
0 0 1 0 0 1),0 0 1 
Now multiplying by the inverse of 
1 0 0 
Mf |e AS 
TM Tı 
0 0 1 


we find that 
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Let us now consider briefly change of coordinates in an inner product 
space. Suppose V is a finite-dimensional inner product space and that 
G = {a,,..., an} and @’ = {ai,...,an} are two ordered orthonormal 
bases for V. There is a unique (necessarily invertible) n X n matrix P 
such that 


[alw = Pale 


for every a in V. If U is the unique linear operator on V defined by 
Ua; = aj, then P is the matrix of U in the ordered basis &: 


n 

U 

a = D Pyro; 
j=) 


Since @ and @’ are orthonormal bases, U is a unitary operator and P is 
a unitary matrix. If T is any linear operator on V, then 


[Tle = P>[T]aP = P*[T]eP. 


Definition. Let A and B be complex n X n matrices. We say that B 
is unitarily equivalent to A if there is an n X n unitary matrix P such 
that B = PAP. We say that B is orthogonally equivalent to A if there 
is ann X n orthogonal matrix P such that B = PAP. 


With this definition, what we observed above may be stated as 
follows: If @ and @’ are two ordered orthonormal bases for V, then, for 
each linear operator T on V, the matrix [T]g is unitarily equivalent to 
the matrix [T]ę. In case V is a real inner product space, these matrices 
are orthogonally equivalent, via a real orthogonal matrix. 


Exercises 


1, Find a unitary matrix which is not orthogonal, and find an orthogonal matrix 
which is not unitary. 


2. Let V be the space of complex n X n matrices with inner product (A|B) = 
tr (AB*). For each M in V, let Tu be the linear operator defined by Ty(A) = MA. 
Show that T x is unitary if and only if M is a unitary matrix. 


3. Let V be the set of complex numbers, regarded as a real vector space. 

(a) Show that (alf) = Re (aĝ) defines an inner product on V. 

(b) Exhibit an (inner product space) isomorphism of V onto R? with the 
standard inner product. 

(c) For each y in V, let My be the linear operator on V defined by M(a) = ya. 
Show that (M,)* = M3. 

(d) For which complex numbers y is My self-adjoint? 

(e) For which y is M, unitary? 
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(£) For which y is M, positive? 

(g) What is det (My)? 

(h) Find the matrix of M, in the basis {1, i}. 

(i) If T is a linear operator on V, find necessary and sufficient conditions 
for T to be an My. 

(j) Find a unitary operator on V which is not an M4. 


4. Let V be R2, with the standard inner product. If U is a unitary operator on V, 
show that the matrix of U in the standard ordered basis is either 


cos 6 —sin 0 cos 6 sin 6 
Fa 0 cos Gi OE be 0 —cos a 

for some real 0, 0 < 6 < 2r. Let Us be the linear operator corresponding to the 
first matrix, i.e., Ug is rotation through the angle 6. Now convince yourself that 
every unitary operator on V is either a rotation, or reflection about the «-axis 
followed by a rotation. 

(a) What is UgU,? 

(b) Show that Uj = Ue. 

(c) Let @ be a fixed real number, and let @ = {a1, a2} be the orthonormal 


basis obtained by rotating {e, €} through the angle @, i.e., a; = Uge; If 0 is 
another real number, what is the matrix of Us in the ordered basis @? 


5. Let V be R3, with the standard inner product. Let W be the plane spanned 
by @ = (1, 1,1) and 8 = (1,1, —2). Let U be the linear operator defined, geo- 
metrically, as follows: U is rotation through the angle 0, about the straight line 
through the origin which is orthogonal to W. There are actually two such rotations 
—choose one. Find the matrix of U in the standard ordered basis. (Here is one 
way you might proceed. Find a; and œz which form an orthonormal basis for W. 
Let a3 be a vector of norm 1 which is orthogonal to W. Find the matrix of U in 
the basis {a, a2, a3}. Perform a change of basis.) 


6. Let V be a finite-dimensional inner product space, and let W be a subspace 
of V. Then V = W @ W+, that is, each æ in V is uniquely expressible in the form 
a= B + y, with Bin W and y in W+. Define a linear operator U by Ua = B — y. 

(a) Prove that U is both self-adjoint and unitary. 
(b) If V is R? with the standard inner product and W is the subspace spanned 
by (1, 0, 1), find the matrix of U in the standard ordered basis. 


7. Let V be a complex inner product space and T a self-adjoint linear operator 
on V. Show that 
(a) {la + iTal|| = ||æ — iTal|| for every a in V. 
(b) a+ iTa = B + 17T@ if and only if a = 8. 
(c) J + iT is non-singular. 
(d) I — iT is non-singular. 
(e) Now suppose V is finite-dimensional, and prove that 


U = (I — iT) + 1T) 


is a unitary operator; U is called the Cayley transform of T. In a certain sense, 
U = f(T), where f(z) = (1 — iz)/(1 + iz). 
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8. If 0 is a real number, prove that the following matrices are unitarily equivalent 
cos@ —sin A a 0 | 
sin 6 cos |’ 0 e” 
9. Let V be a finite-dimensional inner product space and T a positive linear 
operator on V. Let pr be the inner product on V defined by pr(a, 8) = (Tals). 


Let U be a linear operator on V and U* its adjoint with respect to( | ). Prove 
that U is unitary with respect to the inner product pr if and only if T = U*TU. 


10. Let V be a finite-dimensional inner product space. For each a, 8 in V, let 
T a.g be the linear operator on V defined by Ta,a(vy) = (y|8)a. Show that 

(a) Tag = Tp,a. 

(b) trace (Tag) = (alb). 

(c) TapT'y,5 = Ta cains 

(d) Under what conditions is T«,g self-adjoint? 


11. Let V be an n-dimensional inner product space over the field F, and let L(V, V) 
be the space of linear operators on V. Show that there is a unique inner product 
on L(V, V) with the property that ||Te gll? = ||a||||6||? for all a, 8 in V. (Tag 
is the operator defined in Exercise 10.) Find an isomorphism between L(V, V) 
with this inner product and the space of n X n matrices over F, with the inner 
product (A|B) = tr (AB*). 


12. Let V be a finite-dimensional inner product space. In Exercise 6, we showed 
how to construct some linear operators on V which are both self-adjoint and 
unitary. Now prove that there are no others, i.e., that every self-adjoint unitary 
operator arises from some subspace W as we described in Exercise 6. 


13. Let V and W be finite-dimensional inner product spaces having the same 
dimension. Let U be an isomorphism of V onto W. Show that: 

(a) The mapping T => UTU~! is an isomorphism of the vector space L(V, V) 
onto the vector space L(W, W). 

(b) trace (UTU?!) = trace (T) for each T in L(V, V). 

(c) UTagU = Tuawp (Tap defined in Exercise 10). 

(d) (UTU™)* = UT*U-1, 

(e) If we equip L(V, V) with inner product (7,|T2) = trace (7,73), and 
similarly for L(W, W), then T — UTU- is an inner product space isomorphism. 


14. If V is an inner product space, a rigid motion is any function T from V 
into V (not necessarily linear) such that ||Ta — T|| = |la — B|| for alla, Bin V. 
One example of a rigid motion is a linear unitary operator. Another example is 
translation by a fixed vector y: 


Tila) =a +y 


(a) Let V be R? with the standard inner product. Suppose T is a rigid motion 
of V and that T(0) = 0. Prove that T is linear and a unitary operator. 

(b) Use the result of part (a) to prove that every rigid motion of R? is com- 
posed of a translation, followed by a unitary operator. 

(c) Now show that a rigid motion of R? is either a translation followed by a 
rotation, or a translation followed by a reflection followed by a rotation. 
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15. A unitary operator on R‘4 (with the standard inner product) is simply a linear 
operator which preserves the quadratic form 


||(a, Y, 2, tl]? = 7 + y? +2? +e 


that is, a linear operator U such that || Ua||? = ||a||? for all a in R4, In a certain 
part of the theory of relativity, it is of interest to find the linear operators T which 
preserve the form 


(2, y, z, OE = e — r? — y? = 2, 


Now || I|} does not come from an inner product, but from something called 
the ‘Lorentz metric’ (which we shall not go into). For that reason, a linear operator 
T on Ri such that ||Te||% = |jall}, for every œ in R4, is called a Lorentz 
transfermation. 


(a) Show that the function U defined by 
tte yt i] 


y—iz t—2 


Uls, y, zt) = [ 


is an isomorphism of R4 onto the real vector space H of all self-adjoint 2 X 2 
complex matrices. 

(b) Show that ||a||?, = det (Ua). 

(c) Suppose T is a (real) linear operator on the space H of 2 X 2 self-adjoint 
matrices. Show that L = U-!TU is a linear operator on R4. 

(d) Let M be any 2 X 2 complex matrix. Show that Tu(4) = M*AM defines 
a linear operator Tm on H. (Be sure you check that Tm maps H into H.) 

(e) If M isa 2 X 2 matrix such that |det M| = 1, show that Ly = U-!TyU 
is a Lorentz transformation on R4. 

(f) Find a Lorentz transformation which is not an Lm. 


811 
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The principal objective in this section is the solution of the following 
problem. If T is a linear operator on a finite-dimensional inner product 
space V, under what conditions does V have an orthonormal basis con- 
sisting of characteristic vectors for T? In other words, when is there an 
orthonormal basis @ for V, such that the matrix of T in the basis @ is 


diagonal? 

We shall begin by deriving some necessary conditions on T, which 
we shall subsequently show are sufficient. Suppose @ = {a;,..., an} is 
an orthonormal basis for V with the property 
(8-16) Ta; = ej, j=l, M 


This simply says that the matrix of T in the ordered basis @ is the diagonal 
matrix with diagonal entries cı, . . . , Cn. The adjoint operator T* is repre- 
sented in this same ordered basis by the conjugate transpose matrix, i.e., 
the diagonal matrix with diagonal entries ĉ, .. ., ĉn. If V is a real inner 
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product space, the scalars ci,..., Cn are (of course) real, and so it must 
be that T = T* .In other words, if V is a finite-dimensional real inner 
probut space and T is a linear operator for which there is an orthonormal 
basis of characteristic vectos, then T must be self-adjoint. If V is a com- 


plex inner product space, the scalars ci,...,¢, need not be real, i.e., 
T need not be self-adjoint. But notice that T must satisfy 
(8-17) TT* = T*T. 


For, any two diagonal matrices commute, and since T and T* are both 
represented by diagonal matrices in the ordered basis ®, we have (8-17). 
It is a rather remarkable fact that in the complex case this condition is 
also sufficient to imply the existence of an orthonormal basis of charac- 
teristic vectors. 


Definition. Let V be a finite-dimensional inner product space and T a 
linear operator on V. We say that T is normal if it commutes with tts adjoint 
tes TT” = T*T. 


Any self-adjoint operator is normal, as is any unitary operator. Any 
scalar multiple of a normal operator is normal; however, sums and prod- 
ucts of normal operators are not generally normal. Although it is by no 
means necessary, we shall begin our study of normal operators by con- 
sidering self-adjoint operators. 


Theorem 15. Let V be an inner product space and T a self-adjoint 
linear operator on V. Then each characteristic value of T is real, and char- 
acteristic vectors of T associated with distinct characteristic values are 
orthogonal. 


Proof. Suppose c is a characteristic value of T, i.e., that Ta = ca 
for some non-zero vector a. Then 


clala) = (cala) 
(Tala) 
(a|Ta) 
= (alca) 
Elala). 


Since (ala) = 0, we must have c = Z. Suppose we also have T8 = d6 with 
B = 0. Then 


c(als) = (Talp) 
= (a|T8) 


If c #d, then (aĝ) = 0. i 
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It should be pointed out that Theorem 15 says nothing about the 
existence of characteristic values or characteristic vectors. 


Theorem 16. On a finite-dimensional inner product space of positive 
dimension, every self-adjoint operator has a (non-zero) characteristic vector. 


Proof. Let V be an inner product space of dimension n, where 
n > 0, and let T be a self-adjoint operator on V. Choose an orthonormal 
basis @ for V and let A = [T]g. Since T = T*, we have A = A*. Now 
let W be the space of n X 1 matrices over C, with inner product (X|Y) = 
Y*X. Then U(X) = AX defines a self-adjoint linear operator U on W. 
The characteristic polynomial, det (x7 — A), is a polynomial of degree n 
over the complex numbers; every polynomial over C of positive degree 
has a root. Thus, there is a complex number c such that det (cf — A) = 0. 
This means that A — cI is singular, or that there exists a non-zero X 
such that AX = cX. Since the operator U (multiplication by A) is self- 
adjoint, it follows from Theorem 15 that ¢ is real. If V is a real vector 
space, we may choose X to have real entries. For then A and A — cl have 
real entries, and since A — cl is singular, the system (A — cl)X = 0 has 
a non-zero real solution X. It follows that there is a non-zero vector a in 
V suchthat Ta = ca. J 


There are several comments we should make about the proof. 

(1) The proof of the existence of a non-zero X such that AX = eX 
had nothing to do with the fact that A was Hermitian (self-adjoint). It 
shows that any linear operator on a finite-dimensional complex vector 
space has a characteristic vector. In the case of a real inner product space, 
the self-adjointness of A is used very heavily, to tell us that each charac- 
teristic value of A is real and hence that we can find a suitable X with 
real entries. 

(2) The argument shows that the characteristic polynomial of a self- 
adjoint matrix has real coefficients, in spite of the fact that the matrix 
may not have real entries. 

(3) The assumption that V is finite-dimensional is necessary for the 
theorem; a self-adjoint operator on an infinite-dimensional inner product 
space need not have a characteristic value. . 


EXAMPLE 29. Let V be the vector space of continuous complex- 
valued (or real-valued) continuous functions on the unit interval, 
0 < t< 1, with the inner product 


(ti) = [OTH dt. 


The operator ‘multiplication by t? (7f)(t) = tf(), is self-adjoint. Let us 
suppose that Tf = cf. Then 
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t-of =0, OSt<l 


and so f(t) = 0 for t = c. Since f is continuous, f = 0. Hence T has no 
characteristic values (vectors). 


Theorem 17. Let V be a finite-dimensional inner product space, and 
let T be any linear operator on V. Suppose W is a subspace of V which ts 
invariant under T. Then the orthogonal complement of W is invariant 
under T*. 


Proof. We recall that the fact that W is invariant under T does 
not mean that each vector in W is left fixed by T; it means that if æ is in 
W then Ta is in W. Let 8 be in W+. We must show that T*8 is in W+, 
that is, that (a|T*8) = 0 for every a in W. If a is in W, then Ta is in W, 
so (Tals) = 0. But (Tals) = (al T*8). D 


Theorem 18. Let V be a finite-dimensional inner product space, and 
let T be a self-adjoint linear operator on V. Then there is an orthonormal basis 
for V, each vector of which is a characteristic vector for T. 


Proof. We are assuming dim V > 0. By Theorem 16, T has a 
characteristic vector a. Let a; = a/||a|| so that a is also a characteristic 
vector for T and ||a,|| = 1. If dim V = 1, we are done. Now we proceed 
by induction on the dimension of V. Suppose the theorem is true for inner 
product spaces of dimension less than dim V. Let W be the one-dimensional 
subspace spanned by the vector a;. The statement that a is a characteristic 
vector for T simply means that W is invariant under T. By Theorem 17, 
the orthogonal complement W+ is invariant under T* = T. Now W+, 
with the inner product from V, is an inner product space of dimension 
one less than the dimension of V. Let U be the linear operator induced 
on W+ by T, that is, the restriction of T to W+. Then U is self-adjoint, 
and by the induction hypothesis, W+ has an orthonormal basis {a», . . . , aa} 
consisting of characteristic vectors for U. Now each of these vectors is 
also a characteristic vector for T, and since V = W @ W+, we conclude 
that {o1,..., an} is the desired basis for V. f 


Corollary. Let A be an n X n Hermitian (self-adjoint) matrix. Then 
there is a unitary matrix P such that PAP is diagonal (A is unitarily 
equivalent to a diagonal matrix). If A is a real symmetric matriz, there is a 
real orthogonal matrix P such that P—1AP ts diagonal. 


Proof. Let V be C**!, with the standard inner product, and let T 
be the linear operator on V which is represented by A in the standard 


ordered basis. Since A = A*, we have T = T*. Let @ = {ay,..., an} 
be an ordered orthonormal basis for V, such that Ta; = cæ j = 1,...,7. 
If D = [T]g, then D is the diagonal matrix with diagonal entries ci, . . . , Cn. 


Let P be the matrix with column vectors ai,...,a,. Then D = P-!AP. 
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In case each entry of A is real, we can take V to be R”, with the 
standard inner product, and repeat the argument. In this case, P will be 
a unitary matrix with real entries, i.e., a real orthogonal matrix. J 


Combining Theorem 18 with our comments at the beginning of this 
section, we have the following: If V is a finite-dimensional real inner 
product space and 7 is a linear operator on V, then V has an orthonormal 
basis of characteristic vectors for T if and only if T is self-adjoint. Equiv- 
alently, if A isan n X n matrix with real entries, there is a real orthogonal 
matrix P such that P'AP is diagonal if and only if A = A‘. There is no 
such result for complex symmetric matrices. In other words, for complex 
matrices there is a significant difference between the conditions A = A‘ 
and A = A*, 

Having disposed of the self-adjoint case, we now return to the study 
of normal operators in general. We shall prove the analogue of Theorem 18 
for normal operators, in the complex case. There is a reason for this restric- 
tion. A normal operator on a real inner product space may not have any 
non-zero characteristic vectors. This is true, for example, of all but two 
rotations in R?. 


Theorem 19. Let V be a finite-dimensional inner product space and 
T a normal operator on V. Suppose a is a vector in V. Then a is a charac- 
teristic vector for T with characteristic value c if and only if « is a charac- 
teristic vector for T* with characteristic value €. 


Proof. Suppose U is any normal operator on V. Then ||Ual| = 
||U*a||. For using the condition UU* = U*U one sees that 

||Ual|? = (UaļUa) = (a|U*Ua) 

= (o|UU*a) = (U*a|U*a) = ||U*all. 
If c is any scalar, the operator U = T — cI is normal. For (T — cl)* = 
T* — GI, and it is easy to check that UU* = U*U. Thus 
I(T — ef)al| = ||(T* — cL)al| 

so that (T — cl)a = 0 if and only if (T* — @l)a =0. J 


Definition. A complex n X n matrix A is called normal if AA* = 
A*A. 


It is not so easy to understand what normality of matrices or oper- 
ators really means; however, in trying to develop some fecling for the 
concept, the reader might find it helpful to know that a triangular matrix 
is normal if and only if it is diagonal. 


Theorem 20. Let V be a finite-dimensional inner preduct space, T a 
linear operator on V, and @ an orthonormal basis for V. Suppose that the 
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matrix A of T in the basis @ is upper triangular. Then T is normal if and 
only if A is a diagonal matriz. 


Proof. Since @ is an orthonormal basis, A* is the matrix of T* 
in ®. If A is diagonal, then AA* = A*A, and this implies TT* = T*T. 
Conversely, suppose T is normal, and let ® = {a,...,a,}. Then, since 
A is upper-triangular, To, = Anaı. By Theorem 19 this implies, 7*a, = 
Āna. On the other hand, 


T*ay = 2 (A*) 10; 
3 
= 2 Åijaj;. 
J 


Therefore, A1; = 0 for every j > 1. In particular, Aiz = 0, and since A 
is upper-triangular, it follows that 


Tay = A 299. 


Thus 7*a. = Aga, and A»; = 0 for all j = 2. Continuing in this fashion, 
we find that A is diagonal. J 


Theorem 21. Let V be a finite-dimensional complex inner product 
space and let T be any linear operator on V. Then there is an orthonormal 
basis for V in which the matrix of T is upper triangular. 


Proof. Let n be the dimension of V. The theorem is true when 
n = 1, and we proceed by induction on n, assuming the result is true for 
linear operators on complex inner product spaces of dimension n — 1. 
Since V is a finite-dimensional complex inner product space, there is a 
unit vector a in V and a scalar c such that 


T*a = ca. 


Let W be the orthogonal complement of the subspace spanned by «œ and 
let S be the restriction of T to W. By Theorem 17, W is invariant under T. 
Thus S is a linear operator on W. Since W has dimension n — 1, our 
inductive assumption implies the existence of an orthonormal basis 
{ay,..., @a-1} for W in which the matrix of S is upper-triangular; let 
a, =a. Then {ay,..., a} is an orthonormal basis for V in which the 
matrix of T is upper-triangular. J 


This theorem implies the following result for matrices. 


Corollary. For every complex n X n matrix A there is a unitary matrix 
U such that U-'AU is upper-triangular. 


Now combining Theorem 21 and Theorem 20, we immediately obtain 
the following analogue of Theorem 18 for normal operators. 
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Theorem 22. Let V be a finite-dimensional complex inner product 
space and T a normal operator on V. Then V has an orthonormal basis con- 
sisting of characteristic vectors for T. 


Again there is a matrix interpretation. 


Corollary. For every normal matrix A there is a unitary matrix P 
such that PAP is a diagonal matrix. 


Exercises 


l. For each of the following real symmetric matrices A, find a real orthogonal 
matrix P such that PAP is diagonal. 


1 | 1 ‘4 eee sin a 
1 1) 2 1) sin ð —cos@ 


2. Is a complex symmetric matrix self-adjoint? Is it normal? 


3. For 
1 2 3 
A= | 3 4 
3.4 5 


there is a real orthogonal matrix P such that P‘AP = D is diagonal. Find such a 
diagonal matrix D. 


4. Let V be C2, with the standard inner product. Let T be the linear operator on 
V which is represented in the standard ordered basis by the matrix 


-[ 


Show that T is normal, and find an orthonormal basis for V, consisting of charac- 
teristic vectors for T. 


5. Give an example of a 2 X 2 matrix A such that A? is normal, but A is not 
normal. 


6. Let T be a normal operator on a finite-dimensional complex inner product 
space. Prove that T is self-adjoint, positive, or unitary according as every charac- 
teristic value of T is real, positive, or of absolute value 1. (Use Theorem 22 to 
reduce to a similar question about diagonal matrices.) 


7. Let T be a linear operator on the finite-dimensional inner product space V, 
and suppose T is both positive and unitary. Prove T = I. 


8. Prove T is normal if and only if T = T, + iTe where T, and T; are self- 
adjoint operators which commute. 


9. Prove that a real symmetric matrix has a real symmetric cube root; i.e., if A 
is real symmetric, there is a real symmetric B such that B? = A. 


10. Prove that every positive matrix is the square of a positive matrix. 
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11. Prove that a normal and nilpotent operator is the zero operator. 


12. If T is a normal operator, prove that characteristic vectors for T which are 
associated with distinct characteristic values are orthogonal. 


13. Let T be a normal operator on a finite-dimensional complex inner product 
space. Prove that there is a polynomial f, with complex coefficients, such that 
T* = f(T). (Represent 7’ by a diagonal matrix, and see what f must be.) 


14. If two normal operators commute, prove that their product is normal. 


9. Operators on 


Inner Product Spaces 


9.1. Introduction 


We regard most of the topics treated in Chapter 8 as fundamental, 
the material that everyone should know. The present chapter is for the 
more advanced student or for the reader who is eager to expand his knowl- 
edge concerning operators on inner product spaces. With the exception of 
the Principal Axis theorem, which is essentially just another formulation of 
Theorem 18 on the orthogonal diagonalization of self adjoint operators, and 
the other results on forms in Section 9.2, the material presented here is 
more sophisticated and generally more involved technically. We also make 
more demands of the reader, just as we did in the later parts of Chapters 
5 and 7. The arguments and proofs are written in a more condensed style, 
and there are almost no examples to smooth the way; however, we have 
seen to it that the reader is well supplied with generous sets of exercises. 

The first three sections are devoted to results concerning forms on 
inner product spaces and the relation between forms and linear operators. 
The next section deals with spectral theory, i.e., with the implications of 
Theorems 18 and 22 of Chapter 8 concerning the diagonalization of self- 
adjoint and normal operators. In the final section, we pursue the study of 
normal operators treating, in particular, the real case, and in so doing we 
examine what the primary decomposition theorem of Chapter 6 says about 
normal operators. 
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9.2. Forms on Inner Product Spaces 


If 7 is a linear operator on a finite-dimensional inner product space V 
the function f defined on V X V by 


f(a, B) = (Talp) 
may be regarded as a kind of substitute for T. Many questions about T are 
equivalent to questions concerning f. In fact, it is easy to see that f deter- 
mines T. For if ® = {a,,...,a@,} is an orthonormal basis for V, then the 
entries of the matrix of T in @ are given by 


Aj = f(ax, aj). 
It is important to understand why f determines T from a more abstract 


point of view. The crucial properties of f are described in the following 
definition. 


Definition. A (sesqui-linear) form on a real or complex vector space 
V is a function f on V X V with values in the field of scalars such that 


(a) f(ca + B, y) = cf(a, y) + £(6, v) 
(b) f(a, c8 + y) = tfla, B) + f(a, y) 


for all a, B, y in V and all scalars c. 


Thus, a sesqui-linear form is a function on V X V such that f(a, B) 
is a linear function of «æ for fixed 8 and a conjugate-linear function of 8 
for fixed a. In the real case, f(a, 8) is linear as a function of each argument; 
in other words, f is a bilinear form. In the complex case, the sesqui- 
linear form f is not bilinear unless f = 0. In the remainder of this chapter, 
we shall omit the adjective ‘sesqui-linear’ unless it seems important to 
include it. 

If f and g are forms on V and c is a scalar, it is easy to check that 
cf + g is also a form. From this it follows that any linear combination of 
forms on V is again a form. Thus the set of all forms on V is a subspace of 
the vector space of all scalar-valued functions on V X V. 


Theorem 1. Let V be a finite-dimensional inner product space and f a 
form on V. Then there is a unique linear operator T on V such that 


f(a, 8) = (Tal) 
for all a, Bin V, and the map f + T is an isomorphism of the space of forms 
onto L(V, V). 
Proof. Fix a vector 8 in V. Then a > f(a, 8) is a linear function 
on V. By Theorem 6 there is a unique vector 8’ in V such that f(a, 8) = 


(a|B’) for every a. We define a function U from V into V by setting UB = 
B’. Then 
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Il 


Flajs + y) = (alU leB + ¥)) 
= tf (a, 8) + fla, y) 
@a|UB) + (al Uy) 


(alecU8 + Uy) 


for all a, 8, y in V and all scalars c. Thus U is a linear operator on V and 
T = U* is an operator such thatf(a, 8) = (Ta|8) for all a and £. If we also 
have f(a, 8) = (T’a|8), then 


(Ta — T’alg) = 0 


i 


for all æ and 8; so Ta = T'a for all a. Thus for each form f there is a unique 
linear operator T; such that 


f(a, B) = (Tyals) 
for all a, 8 in V. If f and g are forms and c a scalar, then 
(of + g)(a, B) = (Tes+0a|8) 
= efla, B) + g(a, B) 
= ¢(Tya\8) + (Tals) 
= ((cT; + T,)alb) 
for all æ and £ in V. Therefore, 
Testo = cT; +T, 
so f > T; is a linear map. For each T in L(V, V) the equation 
f(a, 8) = (Tals) 
defines a form such that T; = T, and T; = 0 if and only if f = 0. Thus 
fT; is an isomorphism. J 
Corollary. The equation 
(flg) = tr (Tr) 
defines an inner product on the space of forms with the property that 
(flg) = z flar, ai)glak, æi) 
J» 
for every orthonormal basis {an, . . . , an} of V. 


Proof. It follows easily from Example 3 of Chapter 8 that 
(T, U) > tr (TU*) is an inner product on L(V, V). Since f > T; is an 
isomorphism, Example 6 of Chapter 8 shows that 
(Jlo) = tr (T,T%) 


is an inner product. Now suppose that A and B are the matrices of T; and 
T, in the orthonormal basis ® = {a,..., @n}. Then 


A;r = (Tyoxla;) = flar, a;) 
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and By = (T parlaj) = glær a;). Since AB* is the matrix of T/T; in the 
basis @, it follows that 


(flg) = tr (AB*) = Z AnBse | 


Definition. If f is aform and G = {a1,..., dn} an arbitrary ordered 
basis of V, the matrix A with entries 
Aix = f(@x, aj) 


ts called the matrix of f in the ordered basis 8. 


When Gisan orthonormal basis, the matrix of f in @ is also the matrix 
of the linear transformation T;, but in general this is not the case. 

If A is the matrix of f in the ordered basis ® = {a1,..., an}, it follows 
that 


(9-1) F(Z tias D Yar) = E rA rts 


r,8 


for all scalars x; and y, (1 < r,s < n). In other words, the matrix A has 
the property that 
f(a, B) = Y*AX 


where X and Y are the respective coordinate matrices of a and £ in the 
ordered basis 8. 
The matrix of f in another basis 


is given by the equation 
(9-2) A’ = P*AP. 
For 
Ah, = flat, a) 
= f(z Paras, 2 P,jær) 


= È PAG a 
= (P*AP) jx. 


Since P* = P~ for unitary matrices, it follows from (9-2) that results 
concerning unitary equivalence may be applied to the study of forms. 


Theorem 2. Let f be a form on a finite-dimensional complex inner 
product space V. Then there is an orthonormal basis for V in which the matrix 
of f is upper-triangular. 


Proof. Let T be the linear operator on V such that f(a, 8) = 
(Tal) for all a and 8. By Theorem 21, there is an orthonormal basis 
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{a1,..., @a} in which the matrix of T is upper-triangular. Hence, 


flox, aj) = (Taxla;) = 0 
whenj >k. J 


Definition. A form f on a real or complex vector space V is called 
Hermitian if 
f(a, 8) = £(@, a) 
for all a and Bin V. 


If T is a linear operator on a finite-dimensional inner product space V 
and f is the form 


f(a, B) = (Tals) 


then f(@, a) = (a|T8) = (T*a|B); so f is Hermitian if and only if T is self- 
adjoint. 

When f is Hermitian f(a, a) is real for every a, and on complex spaces 
this property characterizes Hermitian forms. 


Theorem 3. Let V be a complex vector space and f a form on V such 
that f(a, a) is real for every a. Then f is Hermitian. 


Proof. Let « and 8 be vectors in V. We must show that f(a, 8) = 
(8, œ). Now 


f(a + B, a + B) = fla, B) + fla, B) + f(6, a) + f(B, 8). 


Since f(a + 8, a + 8), f(a, a), and f(8, 8) are real, the number f(a, 8) + 
(B, œ) is real. Looking at the same argument with a + 76 instead of a + £, 
we see that —if(a, 8) + if(8, a) is real. Having concluded that two num- 
bers are real, we set them equal to their complex conjugates and obtain 


f(a, 6) + f6, a) = f(a, b) + f6, a) 
— ifla, B) + if (6, a) E f(a, b) -=ý (B, a) 


If we multiply the second equation by 7 and add the result to the first 
equation, we obtain 


2f(a, B) = 2f(8,«). T 


Corollary. Let T be a linear operator on a complex finite-dimensional 
inner product space V. Then T is self-adjoint if and only if (Tale) is real for 
every ain V. 


Theorem 4 (Principal Axis Theorem). For every Hermitian form f 
on a finite-dimensional inner product space V, there is an orthonormal basis of 
V in which f is represented by a diagonal matrix with real entries. 
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Proof. Let T be the linear operator such that f(a, 8) = (Tal8) for 
all a and B in V. Then, since f(a, 8) = f6, a) and (Tpla) = (alT 6), it 
follows that 





(Tall) = f(6, a) = (alTB) 
for all a and 8; hence T = T*. By Theorem 18 of Chapter 8, there is an 
orthonormal basis of V which consists of characteristic vectors for T. 
Suppose {aj,..., an} is an orthonormal basis and that 


Ta; = C50; 
for 1 <7 < n. Then 
flax, a3) = (Tarlo;) = disc 
and by Theorem 15 of Chapter 8 each cs is real. D 


Corollary. Under the above conditions 


{(Z xia, È yrar) = È exii 
j J 


Exercises 


1. Which of the following functions f, defined on vectors æ = (zı, £2) and B = 
(Yı, Y2) in C2, are (sesqui-linear) forms on C2? 


(a) f(a, 8) = 1. 

(b) f(a, B) = (£1 T I)? + T2. 

(c) Fla, B) = (a + 7)? — (21 — H)* 
(d) f(a, B) = tz — Ëy. 


2. Let f be the form on R? defined by 
F(a, yr), (22, y2)) = ays + T22 
Find the matrix of f in each of the following bases: 


{(1, 0), (0, 1)}, {(1, —1), (1, 1}, {(1, 2); (3, 4)}. 


1 7 
DE A 


and let g be the form (on the space of 2 X 1 complex matrices) defined by g(X, Y) = 
Y*AX. Is g an inner product? 


3. Let 


4. Let V be a complex vector space and let f be a (sesqui-linear) form on V which 
is symmetric: f(a, 8) = f(@, a). What is f? 


5. Let f be the form on R? given by 
S( (a1, T2), (Ys, Y2)) = tyr + 4ry2 + Zarry2 + 221. 
Find an ordered basis in which f is represented by a diagonal matrix. 


6. Call the form f (left) non-degenerate if 0 is the only vector a such that 
f(a, 8) = 0 for all 6. Let f be a form on an inner product space V. Prove that f is 
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non-degenerate if and only if the associated linear operator Ty (Theorem 1) is 
non-singular. 


7. Let f be a form on a finite-dimensional vector space V. Look at the definition 
of left non-degeneracy given in Exercise 6. Define right non-degeneracy and prove 
that the form f is left non-degenerate if and only if f is right non-degenerate. 

8. Let f be a non-degenerate form (Exercises 6 and 7) on a finite-dimensional 
space V. Let L be a linear functional on V. Show that there exists one and only one 
vector b in V such that L(a) = f(a, B) for all a. 

9. Let f be a non-degenerate form on a finite-dimensional space V. Show that 
each linear operator S has an ‘adjoint relative to f; i.e., an operator S’ such that 


f(Sa, B) = f(a, S’B) for all a, B. 


9.3. Positive Forms 


In this section, we shall discuss non-negative (sesqui-linear) forms 
and their relation to a given inner product on the underlying vector space. 


Definitions. A form f on a real or complex vector space V is non- 
negative if it is Hermitian and f(a, a) > 0 for every ain V. The form f is 
positive if f is Hermitian and f(a, a) > 0 for all a # 0. 


A positive form on V is simply an inner product on V. A non-negative 
form satisfies all of the properties of an inner product except that some non- 
zero vectors may be ‘orthogonal’ to themselves. 

Let f be a form on the finite-dimensional space V. Let ® = {a1,..., an} 
be an ordered basis for V, and let A be the matrix of f in the basis ®, that is, 
Ajk = flax, a;). Ifa = tait --- + Trin, then 


fla, a) = f(d tijaj, 2 Træk) 
J 
= 2 2 1;Erf (aj, ær) 
7 
= DD Ara. 
j k 
So, we see that f is non-negative if and only if 
A = At 
and 
(9-3) DD Arxa > 0 forall scalars 2,...,2n. 
j k 
In order that f should be positive, the inequality in (9-3) must be strict for 


all (a,...,2n) #0. The conditions we have derived state that f is a 
positive form on V if and only if the function 


g(X, Y) = Y*AX 
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is a positive form on the space of n X 1 column matrices over the scalar 
field. 


Theorem 5. Let F be the field of real numbers or the field of complex 
numbers. Let A be ann X n matrix over F. The function g defined by 


(9-4) g(X, Y) = Y*AX 


is a positive form on the space F™ if and only if there exists an invertible 
n X n matrix P with entries in F such that A = P*P. 


Proof. For any n X n matrix A, the function g in (9-4) is a form 
on the space of column matrices. We are trying to prove that g is positive 
if and only if A = P*P. First, suppose A = P*P. Theng is Hermitian and 


g(X, X) = X*P*PX 
= (PX)*PX 
> 0. 


If P is invertible and X + 0, then (PX)*PX > 0. 

Now, suppose that g is a positive form on the space of column matrices. 
Then it is an inner product and hence there exist column matrices Q;,..., 
Qn such that 

Bn = 9(Qi, Qi) 


= QE AQ;. 
But this just says that, if Q is the matrix with columns Q,,..., Qu, then 
Q*AQ = I. Since {Q;, . . . , Qn} is a basis, Q is invertible. Let P = Q~! and 


we have A = P*P. | 


In practice, it is not easy to verify that a given matrix A satisfies the 
criteria for positivity which we have given thus far. One consequence of 
the last theorem is that if g is positive then det A > 0, because det A = 
det (P*P) = det P* det P = |det P|?. The fact that det A > 0 is by no 
means sufficient to guarantee that g is positive; however, there are n 
determinants associated with A which have this property: If A = A* and 
if each of those determinants is positive, then g is a positive form. 


Definition. Let A be an n X n matriz over the field F. The principal 
minors of A are the scalars A,(A) defined by 


An N Axx 
A,.(A) = det} : : |, 1<k <n. 
Aw ++: Axx 


Lemma. Let A be an invertible n X n matrix with entries in a field F. 
The following two statements are equivalent. 
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(a) There is an upper-triangular matrix P with Piy = 1 (1 <k < n) 
such that the matrix B = AP is lower-triangular. 
(b) The principal minors of A are all different from 0. 
Proof. Let P be any n X n matrix and set B = AP. Then 


Bir = È AjrPre. 
If P is upper-triangular and Pix = 1 for every k, then 
k-? 
2 AjrPu = Bik — Ar, k>1. 


Now B is lower-triangular provided By = 0 for j < k. Thus B will be 
lower-triangular if and only if 


(9-5) S AiP = — Ark, 1<j<k-1 
= 2<k<n. 
So, we see that statement (a) in the lemma is equivalent to the statement 
that there exist scalars Px, 1 <r < k, 1 < k < n, which satisfy (9-5) and 
Pu =lick<n. 
In (9-5), for each k > 1 we have a system of k — 1 linear equations 
for the unknowns Pu, Pæ, ..., Pro. The coefficient matrix of that 


system is 
Aun ee Aik | 
Ap Moka A k—ik—1 


and its determinant is the principal minor A,-1(A). If each A,-i(A) # 0, 
the systems (9-5) have unique solutions. We have shown that statement 
(b) implies statement (a) and that the matrix P is unique. 

Now suppose that (a) holds. Then, as we shall see, 


Ai(A) = A(B) 


(9-6) = By Boo eee Bix, k = 1, eee NM 
To verify (9-6), let Ai,..., An and B,,...,B, be the columns of A and 
B, respectively. Then 
Bı = Ay 
(9-7) =l 
B, = D PjA; + Ar, r>l. 
jel 


Fix k, 1 < k < n. From (9-7) we see that the rth column of the matrix 
Bu mors Bu 
Bus res Bu 

is obtained by adding to the rth column of 
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Ay ape Ax 


Ar a Akk 


a linear combination of its other columns. Such operations do not change 
determinants. That proves (9-6), except for the trivial observation that 
because B is triangular A (B) = By --- By. Since A and P are invertible, 
B is invertible. Therefore, 


A(B) = Bu -++ Ban # 0 
and so A.(A) #0,k = 1,...,n. f 


Theorem 6. Let f be a form on a finite-dimensional vector space V 
and let A be the matrix of f in an ordered basis @. Then f is a positive form if 
and only if A = A* and the principal minors of A are all positive. 


Proof. Let’s do the interesting half of the theorem first. Suppose 
that A = A* and AKA) > 0,1 <k <n. By the lemma, there exists an 
(unique) upper-triangular matrix P with Pi, = 1 such that B = AP is 
lower-triangular. The matrix P* is lower-triangular, so that P*B = P*AP 
is also lower-triangular. Since A is self-adjoint, the matrix D = P*AP is 
self-adjoint. A self-adjoint triangular matrix is necessarily a diagonal 
matrix. By the same reasoning which led to (9-6), 


A(D) = A(P*B) 
= A,(B) 
A(4). 


Since D is diagonal, its principal minors are 
A(D) = Dn ++- Dir. 


From A(D) > 0,1 < k < n, we obtain Dx > 0 for each k. 
If A is the matrix of the form f in the ordered basis ® = {au,..., Qn}, 
then D = P*AP is the matrix of f in the basis {ai,..., œa} defined by 


n 
r 
a= 2 Pija. 


See (9-2). Since D is diagonal with positive entries on its diagonal, it is 
obvious that 
X*DX > 0, X #0 


from which it follows that f is a positive form. 

Now, suppose we start with a positive form f. We know that A = A*. 
How do we show that A,(A) > 0, 1 <k <n? Let V; be the subspace 
spanned by a,..., a, and let fx be the restriction of f to Vi X Vx. Evi- 
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dently f is a positive form on V, and, in the basis fa:...,a,} it is 
represented by the matrix 

An © An 

Am > Ang 


As a consequence of Theorem 5, we noted that the positivity of a form 
implies that the determinant of any representing matrix is positive. ff 


There are some comments we should make, in order to complete our 
discussion of the relation between positive forms and matrices. What is it 
that characterizes the matrices which represent positive forms? If f is a 
form on a complex vector space and A is the matrix of f in some ordered 
basis, then f will be positive if and only if A = A* and 


(9-8) X*AX > 0, for all complex X = 0. 


It follows from Theorem 8 that the condition A = A* is redundant, i.e., 
that (9-8) implies A = A*. On the other hand, if we are dealing with a real 
vector space the form f will be positive if and only if A = A‘ and 


(9-9) X'AX > 0, for all real X = 0. 


We want to emphasize that if a real matrix A satisfies (9-9), it does not 
follow that A = A‘. One thing which is true is that, if A = A‘ and (9-9) 
holds, then (9-8) holds as well. That is because 


(X + 1Y)*A(X +7Y) = (X! — iY)A(X + iY) 
= X'AX + Y'AY + 7[X'AY — Y'AX] 


and if A = A‘ then Y‘AX = XAY. 

If A isann X n matrix with complex entries and if A satisfies (9-9), 
we shall call A a positive matrix. The comments which we have just 
made may be summarized by saying this: In either the real or complex 
case, a form f is positive if and only if its matrix in some (in fact, every) 
ordered basis is a positive matrix. 

Now suppose that V is a finite-dimensional inner product space. Let f 
be a non-negative form on V. There is a unique self-adjoint linear operator 
T on V such that 


(9-10) f(a, 8) = (Tals). 
and T has the additional property that (Tala) > 0. 


Definition. A linear operator T on a finite-dimensional inner product 
space V is non-negative if T = T* and (Tala) > 0 for all a in V. A 
positive linear operator is one such that T = T* and (Tala) > 0 for all 
a #0. 
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If V is a finite-dimensional (real or complex) vector space and if (-|-) is 
an inner product on V, there is an associated class of positive linear oper- 
ators on V. Via (9-10) there is a one-one correspondence between that class 
of positive operators and the collection of all positive forms on V. We shall 
use the exercises for this section to emphasize the relationships between 
positive operators, positive forms, and positive matrices. The following 
summary may be helpful. 

If A isann X n matrix over the field of complex numbers, the follow- 
ing are equivalent. 

(1) A is positive, ie, © z Axjtjtx > 0 whenever 2,...,2n are 

J 


complex numbers, not all 0. 

(2) (X|Y) = Y*AX isan inner product on the space of n X 1 complex 
matrices. 

(3) Relative to the standard inner product (X|Y) = Y*X onn X 1 
matrices, the linear operator X — AX is positive. 

(4) A = P*P for some invertible n X n matrix P over C. 

(5) A = A*, and the principal minors of A are positive. 

If each entry of A is real, these are equivalent to: 

(6) A =A‘, and È z Axj;xjte > 0 whenever 21,...,2%, are real 

ik 


numbers not all 0. 

(7) (X|Y) = YAX is an inner product on the space of n X 1 real 
matrices. 

(8) Relative to the standard inner product (X|Y) = Y'X on n X 1 
real matrices, the linear operator X — AX is positive. 

(9) There is an invertible n X n matrix P, with real entries, such 
that A = P!P. 


Exercises 


l. Let V be C?, with the standard inner product. For which vectors æ in V is 
there a positive linear operator T such that a = Te? 


2. Let V be R?, with the standard inner product. If 0 is a real number, let T 
be the linear operator ‘rotation through 6,’ 


Te(x1, £2) = (xı cos 9 — x2 sin 6, xı sin 0 + zz cos 0). 
For which values of 0 is Te a positive operator? 
3. Let V be the space of n X 1 matrices over C, with the inner product (X|Y) = 
Y*GX (where G is an n X n matrix such that this is an inner product). Let A be 
an n X n matrix and T the linear operator T(X) = AX. Find T*. If Y is a fixed 


element of V, find the element Z of V which determines the linear functional 
X — Y*X. In other words, find Z such that Y*X = (X|Z) for all X in V. 
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4. Let V be a finite-dimensional inner product space. If T and U are positive 
linear operators on V, prove that (T + U) is positive. Give an example which 
shows that TU need not be positive. 


5. Let 
A= [; 
2 
(a) Show that A is positive. 
(b) Let V be the space of 2 X 1 real matrices, with the inner product 
(X|¥) = ¥*AX, Find an orthonormal basis for V, by applying the Gram-Schmidt 
process to the basis {X,, Xo} defined by 


sD} [Th 


(c) Find an invertible 2 X 2 real matrix P such that A = P'P. 


Colt bolt 


6. Which of the following matrices are positive? 


12 aes ae ! 

3 4 ? 1 we . 3 ) 2 —1 1 , 2 

t 3 -1 1 3 

7. Give an example of an x» X n matrix which has all its principal minors positive, 
but which is not a positive matrix. 


Wee Gallet cote 
Ol IH cof 


8. Does ((21, £2) (Yn YA) = tı + 221 + 2212 + aH. define an inner product 
on C2? 


9. Prove that every entry on the main diagonal of a positive matrix is positive. 


10. Let V be a finite-dimensional inner product space. If T and U are linear 
operators on V, we write T < U if U — T is a positive operator. Prove the fol- 
lowing: 

(a) T < Uand U < T is impossible. 

(b) If T < U and U < S, then T < S. 

(c) If T < U and 0 < SS, it need not be that ST < SU. 


11. Let V be a finite-dimensional inner product space and Æ the orthogonal 
projection of V onto some subspace. 


(a) Prove that, for any positive number c, the operator cJ + E is positive. 
(b) Express in terms of E a self-adjoint linear operator T such that T? = I + E. 


12. Let n be a positive integer and A the n X n matrix 








1 L 1 
2 3 n 
C a 1 
A= 3 4 n+l 
1 1 1 
n+l n+2 2n — 1 





Prove that A is positive. 
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13, Let A be a self-adjoint n X n matrix. Prove that there is a real number c 
such that the matrix cJ + A is positive. 


14, Prove that the product of two positive linear operators is positive if and 
only if they commute. 


15. Let S and T be positive operators. Prove that every characteristic value of 
ST is positive. 


9.4. More on Forms 


This section contains two results which give more detailed information 
about (sesqui-linear) forms. 


Theorem 7. Let f be a form on a real or complex vector space V and 
{a1,..., a} a basis for the finite-dimensional subspace W of V. Let M be the 
r X r matrix with entries 


Mix = f(ax, ai) 


and W’ the set of all vectors B in V such that f(a, 8) = O for alla in W. Then 
W’ is a subspace of V, and W AO W’ = {0} if and only if M is invertible. 
When this is the case, V = W + W'. 


Proof. If 8 and y are vectors in W’ and c is a scalar, then for 
every a in W 


f(a, c + Y) = ala, 8) + f(a, 7) 
= 0. 


Hence, W” is a subspace of V. 
Now suppose a = 2 zar and that 8 = > y;a;. Then 
=1 j= 
f(a, 8) = z GiM jure 
Jr 
= z (z DiM we). 

It follows from this that W Q W’ = {0} if and only if the homogeneous 
system 
ÈE J;Mir = 0, Ll<k<r 
has a non-trivial solution (y,,..., y,). Hence W M W’ = {0} if and only 
if M* is invertible. But the invertibility of M* is equivalent to the inverti- 


bility of M. 
Suppose that M is invertible and let 


A = (M*)"! = (M-)*, 
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Define g; on V by the equation 
gi(B) = 2 A jef (au, B). 


Then 
gi(cB + y) = z A jf (ax, cB FY) 


c 2 A;jkflar B) + z A yef (ax y) 


cg;(B) + g;(Y). 


Hence, each g; is a linear function on V. Thus we may define a linear 
operator E on V by setting 


Ep = 2 gi(B) ay. 
Since 
gilan) = 2 Anf (x, Hn) 


2 Ajx(M* Jin 


= Ojn 


it follows that E(an) = a, for 1 < n < r. This implies Ha = a for every 
ain W. Therefore, E maps V onto W and E? = E. If @ is an arbitrary 
vector in V, then 


flan, EB) = f (a > g:(B)as) 
= PO (an, aj) 
= z (z A jef (ox, B)) flan, aj). 
Since A* = M—, it follows that 
Flan, EB) = E (E (MDM n) flu, 8) 
= 2 Sknf (ak B) 
= f(an, B). 
This implies f(a, E8) = f(a, 8) for every a in W. Hence 
f(a, 8 — Ep) = 0 
for all ain W and £ in V. Thus Z — E maps V into W’. The equation 
B= EB + (I — E)p 


shows that V = W + W’. One final point should be mentioned. Since 
W MN W' = {0}, every vector in V is uniquely the sum of a vector in W 


333 


J 


Operators on Inner Product Spaces Chap. 9 


and a vector in W’. If 8 is in W’, it follows that E8 = 0. Hence J — E 
maps V onto W’. J 


The projection Æ constructed in the proof may be characterized as 
follows: E8 = a if and only if «æ isin W and 8 — a belongs to W’. Thus Æ 
is independent of the basis of W that was used in its construction. Hence 
we may refer to E as the projection of V on W that is determined by 
the direct sum decomposition 


V=WOw'. 
Note that E is an orthogonal projection if and only if W’ = W+. 


Theorem 8. Let f be a form on a real or complex vector space V and A 
the matrix of f in the ordered basis {on,..., an} of V. Suppose the principal 
minors of A are all different from 0. Then there is a unique upper-triangular 
matrix P with Px = 1 (1 < k < n) such that 


P*AP 
is upper-triangular. 


Proof. Since A(A*) = ALA) (1 < k < n), the principal minors 
of A* are all different from 0. Hence, by the lemma used in the proof of 
Theorem 6, there exists an upper-triangular matrix P with Pą = 1 such 
that A*P is lower-triangular. Therefore, P*A = (A*P)* is upper-tri- 
angular. Since the product of two upper-triangular matrices is again upper- 
triangular, it follows that P*AP is upper-triangular. This shows the 
existence but not the uniqueness of P. However, there is another more 
geometric argument which may be used to prove both the existence and 
uniqueness of P. 

Let W, be the subspace spanned by a,..., a, and Wz the set of all 
B in V such that f(a, 6) = 0 for every a in W,. Since A(A) = 0, the 
k X k matrix M with entries 


Mi; = fla; ai) = Ais 
(1 <17,j X k) is invertible. By Theorem 7 
V=W.® Wi. 


Let E, be the projection of V on W, which is determined by this decom- 
position, and set Eo = 0. Let 


Br = ær — Exon, <k<n). 
Then 8; = a1, and Frias belongs to W,-1 for k > 1. Thus when k > 1, 


there exist unique scalars P; such that 


k-1 
Erias = — LD Piaj. 
j=1 
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Setting Px = 1 and Pj = 0 for 7 > k, we then have an n X n upper- 
triangular matrix P with P,, = 1 and 


k 
bk = 2 P iraj 
j=l 


fork = 1,...,n. Suppose 1 < i < k. Then £; is in W; and W; C Wk. 
Since p; belongs to Wz-1, it follows that f(b; 6.) = 0. Let B denote the 
matrix of f in the ordered basis {£1, . . . , Gn}. Then 


Bri = f(Bis Be) 
so Bu = 0 when k > i. Thus B is upper-triangular. On the other hand, 
B = P*AP. 


Conversely, suppose P is an upper-triangular matrix with Pi = 1 
such that P*AP is upper-triangular. Set 


Pr = È Pra; (ok <n). 
3 


Then {6,,..., 8} is evidently a basis for Wi. Suppose k > 1. Then 
{6:,.. +, Bea} is a basis for Wi, and since f(b: B) = 0 when i < k, we 
see that 6x is a vector in Wz_1. The equation defining 6, implies 


k—-1 
a= — (= Pras) + Bx. 
J= 
k—1 
Now £ Py; belongsto Wi_; and kisin W;-1. Therefore, Pir, . . . , Pru 
j=l 
are the unique scalars such that 
k-1 
Eriak = — z P iraj 
j=l 


so that P is the matrix constructed earlier. 


9.5. Spectral Theory 


In this section, we pursue the implications of Theorems 18 and 22 
of Chapter 8 concerning the diagonalization of self-adjoint and normal 
operators. 


Theorem 9 (Spectral Theorem). Let T be a normal operator on a 
finite-dimensional complex inner product space V or a self-adjoint operator on 
a finite-dimensional real inner product space V. Let cı, . . . , Cg be the distinct 
characteristic values of T. Let Wj be the characteristic space associated with c; 
and E; the orthogonal projection of V on W;. Then W; is orthogonal to Wi 
when i Æ j, V ts the direct sum of W1,..., Wx, and 


(9-11) T = qh + e + Ey. 


356 


Operators on Inner Product Spaces Chap. 9 


Proof. Let œ be a vector in W;, 8 a vector in W,, and suppose 
i = j. Then c,(a|6) = (Tals) = (a\T*8) = (aléiB). Hence (c; — cx)(a\8) = 
0, and since c; — c; # 0, it follows that (al) = 0. Thus W; is orthogonal 
to W; when îi # j. From the fact that V has an orthonormal basis consisting 
of characteristic vectors (cf. Theorems 18 and 22 of Chapter 8), it fol- 
lows that V = Wi + ---+ Wi. If a; belongs to V; 1 <j < k) and 
ay +--+ +a, = 0, then 


0= (ailZ aj) = 2 (asilas) 
I I 


= |la:||? 
for every 2, so that V is the direct sum of Wi,..., Wx. Therefore E, + 
-+ E, = I and 
T = TE, + -+ TE: 
= aE + s+ + Er. | 


The decomposition (9-11) is called the spectral resolution of T. 
This terminology arose in part from physical applications which caused 
the spectrum of a linear operator on a finite-dimensional vector space 
to be defined as the set of characteristic values for the operator. It is 
important to note that the orthogonal projections Æi, ..., Hx are canoni- 
cally associated with T; in fact, they are polynomials in T. 


Corollary. If e; = II =) then Ej = e(T) for 1 <j <k. 
ixj \Cj — Ci 
Proof. Since Eib; = 0 when îi = j, it follows that 
T? = Ei + ++ + &Ey 
and by an easy induction argument that 
T” = åE, +--+ + Ek 


for every integer n > 0. For an arbitrary polynomial 
r 
f= È ant" 
n=0 
we have 


D> aT” 


n=0 


£T) 


n 
ck; 


k 
=l 


r 
È an 
n=0 j 


j 
k T 

2 (2 anc} E; 
j=1 \n=0 

k 

D flc)Es;. 

j=l 


Since e;(Cm) = 5jm, it follows that e;(T) = E; J 
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Because F, . . . , Ex are canonically associated with T and 
EE E Cee a y 
the family of projections {ÆF;, .. ., E} is called the resolution of the 


identity defined by T. 

There is a comment that should be made about the proof of the spectral 
theorem. We derived the theorem using Theorems 18 and 22 of Chapter 8 
on the diagonalization of self-adjoint and normal operators. There is an- 
other, more algebraic, proof in which it must first be shown that the mini- 
mal polynomial of a normal operator is a product of distinct prime factors. 
Then one proceeds as in the proof of the primary decomposition theorem 
(Theorem 12, Chapter 6). We shall give such a proof in the next section. 

In various applications it is necessary to know whether one may 
compute certain functions of operators or matrices, e.g., square roots. 
This may be done rather simply for diagonalizable normal operators. 


Definition. Let T be a diagonalizable normal operator on a finite- 
dimensional inner product space and 


k 
TS > cjE;j 
j=1 
its spectral resolution. Suppose f is a function whose domain includes the 


spectrum of T that has values in the field of scalars. Then the linear operator 
f(T) is defined by the equation 


(9-12) (T) = È (dE, 


Theorem 10. Let T be adiagonalizable normal operator with spectrum S 
on a finite-dimenstonal inner product space V. Suppose f ts a function whose 
domain contains S that has values in the field of scalars. Then f(T) is a 
diagonalizable normal operator with spectrum f(S). If U is a unitary map of 
V onto V’ and T’ = UTU™, then S is the spectrum of T’ and 


{(T’) = Uf(T)U-. 


Proof. The normality of f(T) follows by a simple computation 
from (9-12) and the fact that 


{(T)* = È F(e)E;. 
J 
Moreover, it is clear that for every a in E;(V) 
S(T )a = f(c;)a. 


Thus, the set f(S) of all f(c) with c in S is contained in the spectrum of f(T). 
Conversely, suppose a ¥ 0 and that 


f(T)a = ba. 
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Then a = È Hija and 
l f(P)a = EH T)Bya 
= zs (¢;) Bjax 
= X bE ja. 
Hence, 
IZ Gle) — b)B;al|? = Z [f(c;) — b|?||E;al]? 
= 0. 


Therefore, f(c;) = b or Eja = 0. By assumption, a * 0, so there exists an 
index 7 such that Eia ~ 0. It follows that f(c;) = b and hence that f(S) is 
the spectrum of f(T). Suppose, in fact, that 

f(S) = fbi, R- ., br} 
where bm = bn when m = n. Let Xm be the set of indices 7 such that 
1<i<k and f(c) = bm. Let Pm = 2 E; the sum being extended over 


the indices 7 in Xm. Then Pm is the orthogonal projection of V on the 
subspace of characteristic vectors belonging to the characteristic value bm 
of f(T), and 


HT) = È bnPn 
is the spectral resolution of f(T). 


Now suppose U is a unitary transformation of V onto V’ and that 
T’ = UTU-!. Then the equation 


Ta = ca 
holds if and only if 
T'Ua = cUa. 


Thus S is the spectrum of T’, and U maps each characteristic subspace for 
T onto the corresponding subspace for T”. In fact, using (9-12), we see that 


T’ = DoE}, Ej = UEU> 
J 


is the spectral resolution of 7”. Hence 


A(T’) = THe) EB; 


Z f(c;) UE; U- 
Fi 


U (2 f(c)E;) U4 


Uf(T)U-. J 
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In thinking about the preceding discussion, it is important for one to 

keep in mind that the spectrum of the normal operator T is the set 
S= {er ..., ce} 

of distinct characteristic values. When T is represented by a diagonal 
matrix in a basis of characteristic vectors, it is necessary to repeat each 
value c; as many times as the dimension of the corresponding space of 
characteristic vectors. This is the reason for the change of notation in the 
following result. 


Corollary. With the assumptions of Theorem 10, suppose that T is 
represented in the ordered basis ® = {a,..., an} by the diagonal matrix D 
with entries di,...,dn. Then, in the basis @, f(T) is represented by the 
diagonal matrix f(D) with entries f(dy),..., f(dn). If @’ = {ai,..., an} 
is any other ordered basis and P the matrix such that 


aj = È Pijai 
t 


then P—!f(D)P is the matriz of f(T) in the basis 8’. 


Proof. For each index 1, there is a unique 7 such that 1 <j < k, 
a; belongs to £;(V), and d; = c;. Hence f(T)a; = f(d,)a; for every i, and 


S(T) =  Paf(T)as 
= 2 diPijai 
= È (DP);01 
== (DP); z Py‘ ay 
=2 (P>DP)ijox. d 


It follows from this result that one may form certain functions of a 
normal matrix. For suppose A is a normal matrix. Then there is an inverti- 
ble matrix P, in fact a unitary P, such that PAP! is a diagonal matrix, say 
D with entries dı, . . . , dn. Let f be a complex-valued function which can 
be applied to dı, ..., dn, and let f(D) be the diagonal matrix with entries 
f(d), . . . , f(dn). Then P-'f(D)P is independent of D and just a function of 
A in the following sense. If Q is another invertible matrix such that QAQ7! 
is a diagonal matrix D’, then f may be applied to the diagonal entries of D’ 
and 


P>f(D)P = QDR. 
Definition. Under the above conditions, f(A) is defined as P-£(D)P. 


The matrix f(A) may also be characterized in a different way. In 
doing this, we state without proof some of the results on normal matrices 
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that one obtains by formulating the matrix analogues of the preceding 
theorems. 


Theorem 11. Let A be a normal matriz and a, ..., Cx the distinct 
complex roots of det (xI — A). Let 


x— 
jai \Ci — Cj 


and E; = ei(A) (1 <i < k). Then EE; = 0 when i + j, E? = E;, Et = E; 
and 
I= E +t- + Ex. 
If f is a complex-valued function whose domain includes cı, . . . , Cx, then 
f(A) = f(e1)E, + -+- + f(cr)Ex; 
in particular, A = Ey + --- + exEx. 


We recall that an operator on an inner product space V is non-negative 
if T is self-adjoint and (Tala) > 0 for every ain V. 


Theorem 12. Let T be a diagonalizable normal operator on a finite- 
dimensional inner product space V. Then T is self-adjoint, non-negative, or 
unitary according as each characteristic value of T is real, non-negative, or of 
absolute value 1. 


Proof. Suppose T has the spectral resolution T = qi + +++ + 
cy, then T* = GL, + --- +&,. To say T is self-adjoint is to say 
T = T*, or 

(c1 = &)Eı +. + (Ck E x) Ex = 0. 


Using the fact that E;E; = 0 for i # j, and the fact that no E; is the zero 
operator, we see that T is self-adjoint if and only if c; = ¢;,7 = 1,...,k. 
To distinguish the normal operators which are non-negative, let us look at 


k k 
(Tala) = ( È cE jal 2 Eia) 
j=1 = 
= LD ¢;(E;o| Ea) 
i j 
= J c||E;al]?. 
3 


We have used the fact that (Eja|Eœ) = 0 for i ¥ j. From this it is clear 
that the condition (Ta\a) > 0 is satisfied if and only if c; > 0 for each j. 
To distinguish the unitary operators, observe that 


TT* = cel + +++ + ek, 
= lex]? tree + cx |? 


If TT* = J, then I = |a| Ei + --- + |cl?Er, and operating with E; 
E; = |c|?E;. 
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Since E; = 0, we have |c,|? = 1 or |e;| = 1. Conversely, if |c;|2 = 1 for 
each j, it is clear that TT* = J. f 


It is important to note that this is a theorem about normal operators. 
If T is a general linear operator on V which has real characteristic values, 
it does not follow that T is self-adjoint. The theorem states that if T has 
real characteristic values, and if T is diagonalizable and normal, then T is 
self-adjoint. A theorem of this type serves to strengthen the analogy be- 
tween the adjoint operation and the process of forming the conjugate of a 
complex number. A complex number z is real or of absolute value 1 accord- 
ing as z = 2, or Zz = 1. An operator T is self-adjoint or unitary according 
as T = T* or T*T = I. 

We are going to prove two theorems now, which are the analogues of 
these two statements: 


(1) Every non-negative number has a unique non-negative square 
root. 

(2) Every complex number is expressible in the form ru, where r is 
non-negative and |u| = 1. This is the polar decomposition z = re® for 
complex numbers. 


Theorem 13. Let V be a finite-dimensional inner product space and 
T a non-negative operator on V. Then T has a unique non-negative square root, 


that is, there is one and only one non-negative operator N on V such that 
N? = T. 


Proof. Let T = afi + +- + cE be the spectral resolution of 
T. By Theorem 12, each c; > 0. If c is any non-negative real number, let 


Ve denote the non-negative square root of c. Then according to Theorem 


11 and (9-12) N = VT is a well-defined diagonalizable normal operator 
on V. It is non-negative by Theorem 12, and, by an obvious computation, 
N?” =T, 

Now let P be a non-negative operator on V such that P? = T. We 
shall prove that P = N. Let 


P = dfi + + + dF, 


be the spectral resolution of P. Then d; > 0 for each J, since P is non- 
negative. From P? = T we have 


T = diFi + ++: + GF, 


Now Fi,...,¥F, satisfy the conditions J = Fi+---+F,, FiF; = 0 
for i = j, and no F; is 0. The numbers di,..., d? are distinct, because 
distinct non-negative numbers have distinct squares. By the uniqueness 


of the spectral resolution of T, we must haver = k, and (perhaps reorder- 
ing) F; = E;,d? = c; Thus P =N. f 
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Theorem 14. Let V be a finite-dimensional inner product space and 
let T be any linear operator on V. Then there exist a unitary operator U on V 
and a non-negative operator N on V such that T = UN. The non-negative 
operator N is unique. If T ts invertible, the operator U is also unique. 


Proof. Suppose we have T = UN, where U is unitary and N 
is non-negative. Then T* = (UN)* = N*U* = NU*. Thus T*T = 
NU*UN = N?. This shows that N is uniquely determined as the non- 
negative square root of the non-negative operator T*T. 

So, to begin the proof of the existence of U and N, we use Theorem 
13 to define N as the unique non-negative square root of T*T. If T is 
invertible, then so is N because 


(Na|Na) = (Nala) = (T*Tala) = (Ta|Ta). 


In this case, we define U = TN! and prove that U is unitary. Now 
U* = (TN-1)* = (N7)*7* = (N*)-1T* = N-'!T*, Thus 


UU* = TN-1N-'T* 

= T(N-)27"* 
T(N2)-1T* 
T(T*T)-T* 
TT-\(T*)-1T* 
=f 


ll 


and U is unitary. 

If T is not invertible, we shall have to do a bit more work to define U. 
We first define U on the range of N. Let a be a vector in the range of 
N, say a = NB. We define Ua = Tf, motivated by the fact that we 
want UNS = Tg. We must verify that U is well-defined on the range 
of N; in other words, if NB’ = NB then Tf’ = TB. We verified above 
that ||Ny||? = ||T'y||? for every y in V. Thus, with y = 6 — 8’, we see 
that V(8 — 8’) = 0 if and only if T(8 — 6’) = 0. So U is well-defined on 
the range of N and is clearly linear where defined. Now if W is the range 
of N, we are going to define U on W+. To do this, we need the following 
observation. Since T and N have the same null space, their ranges have 
the same dimension. Thus W+ has the same dimension as the orthogonal 
complement of the range of T. Therefore, there exists an (inner product 
space) isomorphism Uy of W+ onto T(V)+. Now we have defined U on W, 
and we define U on W+ to be Uo. 

Let us repeat the definition of U. Since V = W @ W+, each a in V 
is uniquely expressible in the form a = NB + y, where N8 is in the range 
W of N, and y is in W+, We define 


Ua = TB + Uvy. 


This U is clearly linear, and we verified above that it is well-defined. Also 
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(Ual Ua) = (TB + Uyl T6 + Uoy) 
= (TB|TB) + (Uoy|Uoy) 
= (NB|NB) + (yl) 
= (ala) 
and so U is unitary. We also have UNB = TB foreach 6. J 


We call T = UN a polar decomposition for T. We certainly cannot 
call it the polar decomposition, since U is not unique. Even when T is 
invertible, so that U is unique, we have the difficulty that U and N may 
not commute. Indeed, they commute if and only if T is normal. For 
example, if T = UN = NU, with N non-negative and U unitary, then 


TT* = (NU)(NU)* = NUU*N = N? = T*T, 


The general operator T will also have a decomposition T = NiU,, with 
N, non-negative and U, unitary. Here, N, will be the non-negative square 
root of TT*. We can obtain this result by applying the theorem just 
proved to the operator T*, and then taking adjoints. 

We turn now to the problem of what can be said about the simultane- 
ous diagonalization of commuting families of normal operators. For this 
purpose the following terminology is appropriate. 


Definitions. Let 5 be a family of operators on an inner product space 
V. A function r on F with values in the field F of scalars will be called a root 
of & if there is a non-zero a in V such that 


Ta = r(T)a 


for all T in S. For any function r from & to F, let V(r) be the set of all a in V 
such that Ta = r(T)a for every T in S. 


Then V(r) is a subspace of V, and r is a root of § if and only if V(r) = 
{0}. Each non-zero a in V(r) is simultaneously a characteristic vector for 
every T in. 


Theorem 15. Let 5 be a commuting family of diagonalizable normal 
operators on a finite-dimensional inner product space V. Then F has only a 
finite number of roots. If rı, . . . , ry are the distinct roots of $, then 


(i) V(ri) is orthogonal to V(r;) when i = j, and 
Gi) V = Vay) © --- V(r). 


Proof. Suppose r and s are distinct roots of F. Then there is an 
operator T in F such that r(T) = s(T). Since characteristic vectors 
belonging to distinct characteristic values of T are necessarily orthogonal, 
it follows that V(r) is orthogonal to V(s). Because V is finite-dimensional, 
this implies 5 has at most a finite number of roots. Let rı, . . . , re be the 
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roots of F. Suppose {71, ..., Tm} is a maximal linearly independent subset 
of 5, and let 
(Ea, Ein.. .} 


be the resolution of the identity defined by T; (1 < i < m). Then the 
projections £;; form a commutative family. For each £;; is a polynomial 
in T; and T,,..., Tm commute with one another. Since 


I = ($ Ein) (È Ezi) +++ (2 Emin) 
vA J? Im 
each vector a in V may be written in the form 
(9-13) a= > : F;, Ej. Eis E mina. 
wey Ira 


hy 
Suppose ji,..-.,Jm are indices for which 8 = EipEoj +--+ Emina ~ 0. Let 
Bs = (IL Enj) a. 
ni 
Then 8 = £;;,8;; hence there is a scalar c; such that 
T 8 = cf, 1l<i<cm. 


For each T in F, there exist unique scalars b; such that 


T= > bT, 
i=l 
Thus 
Tp = È b;T;ß 
= (2 bici) B. 


The function T + È bic; is evidently one of the roots, say r: of F, and £ lies 


in V(r,). Therefore, each non-zero term in (9-13) belongs to one of the 
spaces V (rı), . . ., V(r). It follows that V is the orthogonal direct sum of 
V(r), oy V(r). | 


Corollary. Under the assumptions of the theorem, let P; be the orthogonal 
projection of V on V(r;), A <j < k). Then PiP; = 0 when i ¥ j, 
I= Pit: +P, 
and every T in F may be written in the form 


(9-14) T = £ r(T)P;. 


Definitions. The family of orthogonal projections {Pi,..., Px} is 
called the resolution of the identity determined by 5, and (9-14) is the 
spectral resolution of T in terms of this family. 


Although the projections P,;,..., P+ in the preceding corollary are 
canonically associated with the family 5, they are generally not in F nor 
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even linear combinations of operators in $; however, we shall show that 
they may be obtained by forming certain products of polynomials in 
elements of F. 

In the study of any family of linear operators on an inner product 
space, it is usually profitable to consider the self-adjoint algebra generated 
by the family. 


Definition. A self-adjoint algebra of operators on an inner 
product space V is a linear subalgebra of L(V, V) which contains the adjoint 
of each of its members. 


An example of a self-adjoint algebra is L(V, V) itself. Since the 
intersection of any collection of self-adjoint algebras is again a self-adjoint 
algebra, the following terminology is meaningful. 


Definition. If 5 is a family of linear operators on a finite-dimensional 
inner product space, the self-adjoint algebra generated by F is the smallest 
self-adjoint algebra which contains S. 


Theorem 16. Let ¥ be a commuting family of diagonalizable normal 
operators on a finite-dimensional inner product space V, and let @ be the self- 
adjoint algebra generated by ¥ and the identity operator. Let {P1,..., Px} be 
the resolution of the identity defined by F. Then @ is the set of all operators on 
V of the form 

k 
(9-15) T= 2 cjP; 
jel 
where cı, . . . , Cx are arbitrary scalars. 


Proof. Let © denote the set of all operators on V of the form 
(9-15). Then © contains the identity operator and the adjoint 


T* = 2 6;P; 

3 

of each of its members. If T = Dc;P; and U = È d;P;, then for every 
J J 
scalar a 
af + U= È (ac + d;)P; 

$ 

and 


TU = 2 cid;P:P; 
tJ 
= È co;d;P; 
2 
= UT. 


Thus @ is a self-adjoint commutative algebra containing F and the identity 
operator. Therefore C contains @. 
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Now let rı, ..., 7% be all the roots of $. Then for each pair of indices 
(i, n) with i ¥ n, there is an operator Tin in F such that r:(Tin) Æ 1n(T in). 
Let Qin = Ti(Tin) — Ta(Tin) and bin = ra(Tin). Then the linear operator 
Q: = II ain (Tin = bind) 
n*i 


is an element of the algebra @. We will show that Q; = P; (1 < i < k). For 
this, suppose 7 ¥ 7 and that @ is an arbitrary vector in V(r;). Then 
Tija = ri(T i)a 
= bija 
so that (T:; — b:;J)a = 0. Since the factors in Q; all commute, it follows 


that Qia = 0. Hence Q; agrees with P; on V(r;) whenever j = i. Now 
suppose a is a vector in V(r:). Then Tina = ri(Tin)a, and 


an (Tin a bind Jax == ain [ri(T in) = tr( Tin) la = a. 
Thus Q;a = a and Q; agrees with P; on V(r;); therefore, Q; = P; for 
i = 1,..., k. From this it follows that @ =e. J 


The theorem shows that the algebra @ is commutative and that each 
element of @ is a diagonalizable normal operator. We show next that @ has 
a single generator. 


Corollary. Under the assumptions of the theorem, there is an operator 
T in Q such that every member of Q is a polynomial in T. 


k 
Proof. Let T = > #;P; wheret,..., tare distinct scalars. Then 
j=l 
k 
Te = > YP; 
j=l 
forn = 1,2,.... If 
8 
f= È a” 
n=1 


it follows that 
S(T) 


3 
_ 
3 
Ii 
_ 
S 


Q. 
m 
2 


li 
iM Ms IMs 


‘a 


Given an arbitrary 
k 
U = > cP; 
j=1 


in Q, there is a polynomial f such that f(t;) = c; (1 <j < k), and for any 
such f, U = f(T). J 
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Exercises 


1. Give a reasonable definition of a non-negative n X n matrix, and then prove 
that such a matrix has a unique non-negative square root. 


2. Let A be an n X n matrix with complex entries such that A* = — A, and let 
B = eA, Show that 


(a) det B = etr A; 
(b) BY = e4; 
(c) B is unitary. 


3. If U and T are normal operators which commute, prove that U + T and UT 
are normal. 


4. Let T be a linear operator on the finite-dimensional complex inner product 
space V. Prove that the following ten statements about T are equivalent. 


(a) T is normal. 

(b) ||La|] = ||7’*e|| for every «æ in V. 

(c) T = T, + iT, where T, and T: are self-adjoint and TT = TT}. 

(d) If æ is a vector and c a scalar such that Ta = ca, then T*a = ĉa. 

(e) There is an orthonormal basis for V consisting of characteristic vectors 
for T. 

(£) There is an orthonormal basis @ such that [T']w is diagonal. 

(g) There is a polynomial g with complex coefficients such that T* = g(T). 

(h) Every subspace which is invariant under T is also invariant under T*. 

(i) T = NU, where N is non-negative, U is unitary, and N commutes with U. 

(j) T= ak; + eii + cE, where I = Eı+ ik + Er, E;E;j =0 for i Æj, 
and E? = E; = Ef. 


5. Use Exercise 3 to show that any commuting family of normal operators (not 
necessarily diagonalizable ones) on a finite-dimensional inner product space gen- 
erates a commutative self-adjoint algebra of normal operators. 


6. Let V be a finite-dimensional complex inner product space and U a unitary 
operator on V such that Ua = a implies a = 0. Let 


fe) = i 23, 224 


and show that 


(a) f(U) = i(I + U)(I — Uy; 
(b) f(U) is self-adjoint; 
(c) for every self-adjoint operator T on V, the operator 


U = (T —11)(T +i) 
is unitary and such that T = f(U). 
7. Let V be the space of complex n X n matrices equipped with the inner product 


(A|B) = tr (AB*), 
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If B is an element of V, let Ls, Rs, and Tz denote the linear operators on V de- 
fined by 


(a) La(A) = BA. 
(b) Ra(A) = AB. 
(c) Ts(A) = BA — AB. 


Consider the three families of operators obtained by letting B vary overall diagonal 
matrices. Show that each of these families is a commutative self-adjoint algebra 
and find their spectral resolutions. 


8. If B is an arbitrary member of the inner product space in Exercise 7, show that 
Lz is unitarily equivalent to Rs. 


9. Let V be the inner product space in Exercise 7 and G the group of unitary 
matrices in V. If B is in G, let Cs denote the linear operator on V defined by 


C2(A) = BAB. 
Show that 


(a) Cz is a unitary operator on V; 
(b) Cas, = CBC B; 
(c) there is no unitary transformation U on V such that 


UL;U— = Cg 
for all B in G. 


10. Let F be any family of linear operators on a finite-dimensional inner product 
space V and @ the self-adjoint algebra generated by F. Show that 


(a) each root of Q defines a root of F; 
(b) each root r of @ is a multiplicative linear function on A, i.e., 


r(TU) = r(T)r(U) 
r(cT + U) = er(T) + r(U) 


for all T and U in @ and all scalars c. 


11. Let F be a commuting family of diagonalizable normal operators on a finite- 
dimensional ifner product space V; and let @ be the self-adjoint algebra generated 
by F and the identity operator J. Show that each root of @ is d-fferent from 0, 
and that for each root r of F there is a unique root s of @ such that s(T) = r(T) 
for all T in F. 


12. Let F be a commuting family of diagonalizable normal] operators on a finite- 
dimensional inner product space V and Ay the self-adjoint algebra generated by S. 
Let @ be the self-adjoint algebra generated by F and the identity operator T. 
Show that 


(a) @ is the set of all operators on V of the form cl + T where c is a scalar 
and T an operator in @y 

(b) There is at most onc root r of @ such that r(T) = 0 for all T in Qo. 

(c) If one of the roots of @ is 0 on Qo, the projections P;,..., Py in the resolu- 
tion of the identity defined by F may be indexed in such a way that Qo consists 
of all operators on V of the form 
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where co, ..., Cx are arbitrary scalars. 
(d) @ = Qo if and only if for each root r of @ there exists an operator T in @ 
such that r(T) 4 0. 
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9.6. Further Properties of Normal 


Operators 


In Section 8.5 we developed the basic properties of self-adjoint and 
normal operators, using the simplest and most direct methods possible. 
In Section 9.5 we considered various aspects of spectral theory. Here we 
prove some results of a more technical nature which are mainly about 
normal operators on real spaces. 

We shall begin by proving a sharper version of the primary decompo- 
sition theorem of Chapter 6 for normal operators. It applies to both the 
real and complex cases. 


Theorem 17. Let T be a normal operator on a finite-dimenstonal inner 
product space V. Let p be the minimal polynomial for T and pi,-:-, Pk 
its distinct monic prime factors. Then each pj occurs with multiplicity 1 in 
the factorization of p and has degree 1 or 2. Suppose W; is the null space of 
pj(T). Then 


(i) W; is orthogonal to W; when i # j; 
ji) V=Wi®---Ow,; 
(iii) Wj is invariant under T, and p; is the minimal polynomial for the 
restriction of T to W;; 
(iv) for every j, there is a polynomial e; with coefficients in the scalar 
field such that e;(T) is the orthogonal projection of V on Wj. 


In the proof we use certain basic facts which we state as lemmas. 


Lemma 1. Let N be a normal operator on an inner product space W. 
Then the null space of N is the orthogonal complement of its range. 


Proof. Suppose (a|N8) = 0 for all 8 in W. Then (N*als) = 0 
for all 8; hence N*a = 0. By Theorem 19 of Chapter 8, this implies Na = 0. 
Conversely, if Na = 0, then N*a = 0, and 
(N*al8) = (a|NB) = 0 
forallpinW. J 


Lemma 2. If N is a normal operator and a is a vector such that 
N’a = 0, then Na = 0. 
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Proof. Suppose N is normal and that N?a = 0. Then Na lies in 
the range of N and also lies in the null space of N. By Lemma 1, this 
implies Na =0. J 


Lemma 3. Let T be a normal operator and f any polynomial with 
coefficients in the scalar field. Then {(T) is also normal. 


Proof. Suppose f = ao + ax + -+-+ + anz”. Then 
f(T) = aol + aT + ++: + aT” 


and 
A(T)" = Gol + @T* + +++ +4,(T*)% 
Since T*T = TT*, it follows that f(T) commutes with f(T)*. I 


Lemma 4. Let T be a normal operator and f, g relatively prime poly- 
nomtals with coefficients in the scalar field. Suppose a and B are vectors such 
that {(T)a = 0 and g(T)@ = 0. Then (a8) = 0. 


Proof. There are polynomials a and b with coefficients in the 
scalar field such that af + bg = 1. Thus 
a(T) f(T) + 0(T)g(T) = I 
and a = g(T)b(T)a. It follows that 
(alb) = (9(T)b(T)a|8) = (b(T)alg(T)*8). 


By assumption g(7)8 = 0. By Lemma 3, g(T) is normal. Therefore, by 
Theorem 19 of Chapter 8, g(T)*8 = 0; hence (al) = 0. I 


Proof of Theorem 17. Recall that the minimal polynomial for T 
is the monic polynomial of least degree among all polynomials f such that 
f(T) = 0. The existence of such polynomials follows from the assumption 
that V is finite-dimensional. Suppose some prime factor p; of p is repeated. 
Then p = p7g for some polynomial g. Since p(T) = 0, it follows that 

(p(T))?o(T)a = 0 
for every a in V. By Lemma 3, p;(T) is normal. Thus Lemma 2 implies 
p(T )g(T)a = 0 


for every a in V. But this contradicts the assumption that p has least 
degree among all f such that f(T) = 0. Therefore, p = pı --- pe. If V is 
a complex inner product space each p; is necessarily of the form 


Di =e — G; 
with c; real or complex. On the other hand, if V is a real inner product 
space, then p; = z; — c; with c in R or 
p; = (x — c)(x ~ ©) 
where c is a non-real complex number. 
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Now let f; = p/p;. Then, since fı, . . . , fe are relatively prime, there 
exist polynomials g; with coefficients in the scalar field such that 
(9-16) 1 = È fig; 
I 


We briefly indicate how such g; may be constructed. If p; = £ — C;, 
then f;(c;) = 0, and for g; we take the scalar polynomial 1/f;(c;). When 
every p; is of this form, the f;g; are the familiar Lagrange polynomials 
associated with c, ...,C and (9-16) is clearly valid. Suppose some 
p; = (x — c)(x — ĉ) with c a non-real complex number. Then V is a real 
inner product space, and we take 








where s = (c — @)f;(c). Then 
jie (s + Sx > (es + &s) 
sŠ 
so that g; is a polynomial with real coefficients. If p has degree n, then 
Li 2 Jagi 


is a polynomial with real coefficients of degree at most n — 1; moreover, 
it vanishes at each of the n (complex) roots of p, and hence is identically 0. 
Now let æ be an arbitrary vector in V. Then by (9-16) 


a= 2 fi(T)g(T)a 


and since p;(T)f;(T) = 0, it follows that f;(T)g;(T)a is in W; for every j. 
By Lemma 4, W; is orthogonal to W; whenever 1 + j. Therefore, V is the 
orthogonal direct sum of Wi,..., W,. If 8 is any vector in W;, then 
pAT)TB = Tp,(T)B = 0; 

thus W; is invariant under T. Let T; be the restriction of T to W,;. Then 
p;(T;) = 0, so that p; is divisible by the minimal polynomial for T;. Since 
p; is irreducible over the scalar field, it follows that p; is the minimal poly- 
nomial for T';. 

Next, let e; = fig; and E; = e;(T). Then for every vector a in V, 
Eœ isin W, and 

a = È Eja. 
J 


Thus a — Eia = È E;a; since W; is orthogonal to W; when j =Æ i, this 


j*i 
implies that a — Eia is in We. It now follows from Theorem 4 of Chapter 
8 that E; is the orthogonal projection of V on W;. D 


Definition. We call the subspaces W; (1 < j < k) the primary com- 
ponents of V under T. 
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Corollary. Let T be a normal operator on a finite-dimensional inner 


product space V and Wi,..., Wx the primary components of V under T. 
Suppose W is a subspace of V which ts invariant under T. Then 
W=E WAW. 
j 


Proof. Clearly W contains 2 W N W;. On the other hand, W, being 
I 


invariant under T, is invariant under every polynomial in T. In particular, 

W is invariant under the orthogonal projection L; of V on W;. If œ isin W, 

it follows that E; is in W A W;, and, at the same time, a = È Eja. 
J 


Therefore W is contained in £ W AO W; J 
3 


Theorem 17 shows that every normal operator T on a finite- 
dimensional inner product space is canonically specified by a finite number 
of normal operators T;, defined on the primary components W, of V under 
T, each of whose minimal polynomials is irreducible over the field of 
scalars. To complete our understanding of normal operators it is necessary 
to study normal operators of this special type. 

A normal operator whose minimal polynomial is of degree 1 is clearly 
just a scalar multiple of the identity. On the other hand, when the minimal 
polynomial is irreducible and of degree 2 the situation is more complicated. 


Examp.eE 1. Suppose r > 0 and that 0 is a real number which is not 
an integral multiple of 7. Let T be the linear operator on R? whose matrix 
in the standard orthonormal basis is 


emo be 6 —sin | 
sin 0 cos 0 
Then T is a scalar multiple of an orthogonal transformation and hence 


normal. Let » be the characteristic polynomial of T. Then 


p = det (zI — A) 
= (x — r cos 0)? + r? sin? 0 
= x — 2r cos 6x + 7’. 


Let a = r cos 0, b = r sin 0, and c = a + ib. Then b # 0, c = re” 
a —b 
oe l d 
and p = (x — c)(x — @). Hence p is irreducible over R. Since p is divisible 


by the minimal polynomial for T, it follows that p is the minimal poly- 
nomial. 


This example suggests the following converse. 
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Theorem 18. Let T be a normal operator on a finite-dimensional real 
inner product space V and p its minimal polynomial. Suppose 


p = (x ~ a)? + b? 
where a and b are real and b = 0. Then there is an integer s > 0 such that 
p° ts the characteristic polynomial for T, and there exist subspaces Vi,..., Vs 
of V such that 
(i) V; ts orthogonal to Vi when i ¥ j; 
(ii) V=Vi®---OV,; 
Gii) each V; has an orthonormal basis {aj, Bi} with the property that 
Ta; = aa; + bB; 
T8; = —baj + af}. 


In other words, if r = Va? + b? and 9 is chosen so that a = r cos 0 
and b = r sin 0, then V is an orthogonal direct sum of two-dimensional 
subspaces V; on each of which T acts as ‘r times rotation through the 
angle 6’ . 

The proof of Theorem 18 will be based on the following result. 


Lemma. Let V be a real inner product space and S a normal operator 
on V such that S? + I = 0. Let a be any vector in V and B = Sa. Then 
Sta = —6 
(9-17) 
S*8 = ca 


(a\8) = 0, and |lal| = |I6ll- 


Proof. We have Sa = 8 and S®@ = S’a = —a. Therefore 


0 = ||Sa — Bll? + [ISB + all? = ||Sa]|? — 20Sal8) + [jel]? 
+ ||SA||? + 2(SBla) + |lall?. 
Since S is normal, it follows that 
0 = |[S*e||? ~ 20S*6la) + lell? + ||S*8]|? + 20S*al8) + lloll? 
= ||S*æ + B|? + |)S*8 — all?. 
This implies (9-17); hence 


(alB) = (S*6|8) = (6|SB) 


1 ou 
1s 
£I 
WR 
S n 


and (aļ8) = 0. Similarly 
llall? = (S*sla) = (6|Sa) = llel. T 
Proof of Theorem 18. Let Vi,..., Vs be a maximal collection 


of two-dimensional subspaces satisfying (i) and (ii), and the additional 
conditions 
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T*a; = aa; ~ bB;, 
(9-18) l<j<s. 
T*B; = ba; + ag; 


Let W = Vi +- +V. Then W is the orthogonal direct sum of 
Vi,..., Vs. We shall show that W = V. Suppose that this is not the case. 
Then W+ = {0}. Moreover, since (iii) and (9-18) imply that W is invariant 
under T and 7%, it follows that W+ is invariant under T* and T = T**, 
Let S = b-1(T — al). Then S* = 6-1(T* — al), S*S = SS*, and W+ is 
invariant under S and S*. Since (T — al)? + b?I = 0, it follows that 
S2 + I = 0. Let a be any vector of norm 1 in W+ and set 8 = Sa. Then 
B isin W+ and SB = —a. Since T = al + bS, this implies 


Ta = aa + bg 
TB = —ba + af. 


By the lemma, Sta = —8, S*8 = a, (al) = 0, and ||6|| = 1. Because 
T* = al + bS*, it follows that 


T*a = aa — bB 
T*B = ba + af. 
But this contradicts the fact that Vi,..., Vs is a maximal collection of 


subspaces satisfying (i), (iii), and (9-18). Therefore, W = V, and since 
r—a b ee ? 
det | Si RA = (z —a)} +b 
it follows from (i), (ii) and (iii) that 
det (z7 — T) = [(x — a} + bF. I 


Corollary. Under the conditions of the theorem, T is invertible, and 


T* = (a? + bê) T.. 
Proof. Since 


f ll a Jaje” 0 ] 

b a}L—b a} 0 a? + b? 

it follows from (iii) and (9-18) that TT* = (a? + b?)I. Hence T is invertible 
and T* = (a? + bÌ T. 


Theorem 19, Let T be a normal operator on a finite-dimenstonal inner 
product space V. Then any linear operator that commutes with T also com- 
mutes with T*. Moreover, every subspace invariant under T is also invariant 
under T*. 


Proof. Suppose U is a linear operator on V that commutes with 
T. Let E; be the orthogonal projection of V on the primary component 
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W; (1 <j < k)of V under T. Then Æ; is a polynomial in T and hence 
commutes with U. Thus 


E,UE; = UE} = UE, 


Thus U (W ;) is a subset of W ;. Let T; and U; denote the restrictions of T and 
U to W;. Suppose J; is the identity operator on W;. Then U; commutes 
with T;, and if T; = c;Z;, it is clear that U; also commutes with T} = €,/;. 
On the other hand, if T; is not a scalar multiple of 7;, then T; is invertible 
and there exist real numbers a; and b; such that 


T} = (a? + BT; 


Since U;T; = T;U;, it follows that T7'U; = U;T7*. Therefore U; com- 
mutes with Tř in both cases. Now T* also commutes with Æ;, and hence 
W; is invariant under T*. Moreover for every a and 8 in W; 


(Tjo|8) = (Talb) = (a|T*B) = (a|T78). 
Since 7*(W,) is contained in W,, this implies Tř is the restriction of 7'* 
to W;. Thus 
UT*a; = T* UVa; 
for every a; in W;. Since V is the sum of W,,..., Wp it follows that 
UT*a = T*Ua 
for every a in V and hence that U commutes with T*, 


Now suppose W is a subspace of V that is invariant under T, and let 
Z; = W A W;. By the corollary to Theorem 17, W = È Z;. Thus it suffices 


I 
to show that each Z; is invariant under 7%. This is clear if T; = c,J. When 
this is not the case, T; is invertible and maps Z; into and hence onto Z;. 
Thus 7;1(Z;) = Z; and since 
Tř = (aj + b7)T;* 
it follows that T*(Z;) is contained in Z;, for every j. I 


Suppose 7’ is a normal operator on a finite-dimensional inner product 
space V. Let W be a subspace invariant under T. Then the preceding 
corollary shows that W is invariant under T*. From this it follows that 
W+ is invariant under T** = T (and hence under T* as well). Using this 
fact one can easily prove the following strengthened version of the cyclic 
decomposition theorem given in Chapter 7. 


Theorem 20. Let T be a normal linear operator on a finite-dimensional 
inner product space V (dim V > 1). Then there exist r non-zero vectors 
a,..., arin V with respective T-annthilators e, . . . , er such that 

(i) V = Za; T) +++ © Zar; T); 
(ii) if 1 < k < r — 1, then ex+ı divides ex; 
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(iii) Z(aj; T) is orthogonal to Z(ax;T) when j = k. Furthermore, the 
integer r and the annthilators e1,..., er are uniquely determined by condi- 
tions (i) and (ii) and the fact that no ax is 0. 


Corollary. If A is a normal matrix with real (complex) entries, then 
there is a real orthogonal (unitary) matrix P such that PAP is in rational 
canonical form. 


It follows that two normal matrices A and B are unitarily equivalent 
if and only if they have the same rational form; A and B are orthogonally 
equivalent if they have real entries and the same rational form. 

On the other hand, there is a simpler criterion for the unitary equiv- 
alence of normal matrices and normal operators. 


Definitions. Let V and V’ be inner product spaces over the same field. 
A linear transformation 


U:V > V’ 


is called a unitary transformation if it maps V onto V' and preserves 
inner products. If T is a linear operator on V and T’ a linear operator on V’, 
then T is unitarily equivalent to T’ of there exists a unitary transformation 
U of V onto V’ such that 


UTU- = T’. 


Lemma. Let V and V' be finite-dimensional inner product spaces over 
the same field. Suppose T is a linear operator on V and that T’ is a linear 
operator on V’. Then T is unitarily equivalent to T’ if and only af there is an 
orthonormal basis @ of V and an orthonormal basis 8’ of V’ such that 


[T]e = [T’]o. 


Proof. Suppose there is a unitary transformation U of V onto 
V’ such that UTU-1 = T’. Let @ = {an..., an} be any (ordered) 
orthonormal basis for V. Let aj = Ua; (1 < j < n). Then @’ = {ai,.. 
æn} is an orthonormal basis for V’ and setting 


a J 


Ta; = > A kjak 
we see that 
T'a; = UTa; 
= 2 ArjVox 
= Ð Axjon 
k 


Hence [T]g = A = [T]e. 





Sec. 9.6 Further Properties of Normal Operators 


Conversely, suppose there is an orthonormal basis @ of V and an 
orthonormal basis @’ of V’ such that 


Tle = [Tle 
andlet A = [T]g. Suppose G = {a1,..., an} and that @’ = {ai,..., an}. 
Let U be the linear transformation of V into V’ such that Ua; = aj 
(1 <j < n). Then U is a unitary transformation of V onto V’, and 
UTU- a; = UTa; 
= U 2 Anjo 


= Ð Axjor. 
k 


Therefore, UTU~œ; = T’a; (1 <j <n), and this implies UTU-! = 
T. | 


It follows immediately from the lemma that unitarily equivalent 
operators on finite-dimensional spaces have the same characteristic poly- 
nomial. For normal operators the converse is valid. 


Theorem 21. Let V and V’ be finite-dimensional inner product spaces 
over the same field. Suppose T is a normal operator on V and that T’ ts a 
normal operator on V’. Then T is unttarily equivalent to T’ if and only if T 
and T’ have the same characteristic polynomial. 


Proof. Suppose T and T’ have the same characteristic poly- 
nomial f. Let W; (1 < j < k) be the primary components of V under T 
and T; the restriction of T to W;. Suppose J; is the identity operator on 
W ;. Then 


k 
f= I det (xI; — T;). 
j= 


Let p; be the minimal polynomial for T;. If p; = x — c itis clear that 
det (zI; — T;) = (£ — c) 
where s; is the dimension of W;. On the other hand, if p; = (x — a;)? + 63 
with a;, b; real and b; Æ 0, then it follows from Theorem 18 that 
det (xI; — T;) = př 

where in this case 2s; is the dimension of W;. Therefore f = IT p. Now 
I 

we can also compute f by the same method using the primary components 

of V’ under T’. Since py,..., px are distinct primes, it follows from the 

uniqueness of the prime factorization of f that there are exactly k primary 

components W; (1 < j < k) of V’ under T’ and that these may be indexed 


in such a way that p; is the minimal polynomial for the restriction T; of 
T’ to W;. If p; = x — c, then T; = c;I; and T} = cI; where I; is the 
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identity operator on W;. In this case it is evident that T; is unitarily 
equivalent to 7}. If p; = (x — a,)? + bf, as above, then using the lemma 
and Theorem 20, we again see that T; is unitarily equivalent to T;. Thus 
for each j there are orthonormal bases G; and G; of W; and W3, respec- 
tively, such that 


(Tile; = (Tie, 
Now let U be the linear transformation of V into V’ that maps each @; 


onto @;. Then U is a unitary transformation of V onto V’ such that 
UTU =T. I 


10. Bilinear 


Forms 


10.1. Bilinear Forms 


In this chapter, we treat bilinear forms on finite~-dimensional vector 
spaces. The reader will probably observe a similarity between some of the 
material and the discussion of determinants in Chapter 5 and of inner 
products and forms in Chapter 8 and in Chapter 9. The relation between 
bilinear forms and inner products is particularly strong; however, this 
chapter does not presuppose any of the material in Chapter 8 or Chapter 9. 
The reader who is not familiar with inner products would probably profit 
by reading the first part of Chapter 8 as he reads the discussion of bilinear 
forms. 

This first section treats the space of bilinear forms on a vector space 
of dimension n. The matrix of a bilinear form in an ordered basis is intro- 
duced, and the isomorphism between the space of forms and the space of 
n X n matrices is established. The rank of a bilinear form is defined, and 
non-degenerate bilinear forms are introduced. The second section discusses 
symmetric bilinear forms and their diagonalization. The third section 
treats skew-symmetric bilinear forms. The fourth section discusses the 
group preserving a non-degenerate bilinear form, with special attention 
given to the orthogonal groups, the pseudo-orthogonal groups, and a 
particular pseudo-orthogonal group—the Lorentz group. 


Definition. Let V be a vector space ever the field F. A bilinear form 
on V ts a function f, which assigns to each ordered pair of vectors a, Bin V a 
scalar f(a, B) in F, and which satisfies 
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f(ca, + a2, B) = cf(a1, B) + fla, B) 
f(a, C81 + b2) = ef(a, Bi) + f(a, Be). 


If we let V X V denote the set of all ordered pairs of vectors in V, 
this definition can be rephrased as follows: A bilinear form on V is a func- 
tion f from V X V into F which is linear as a function of either of its 
arguments when the other is fixed. The zero function from V X V into F 
is clearly a bilinear form. It is also true that any linear combination of 
bilinear forms on V is again a bilinear form. To prove this, it is sufficient 
to consider linear combinations of the type cf + g, where f and g are 
bilinear forms on V. The proof that cf + g satisfies (10-1) is similar to many 
others we have given, and we shall thus omit it. All this may be summarized 
by saying that the set of all bilinear forms on V is a subspace of the space 
of all functions from V X V into F (Example 3, Chapter 2). We shall 
denote the space of bilinear forms on V by L(V, V, F). 


(10-1) 


Examp.e 1. Let V be a vector space over the field F and let Lı and 
Lz be linear functions on V. Define f by 


F(a, 6) = Ln(a)L2(8). 


If we fix 6 and regard f as a function of a, then we simply have a scalar 
multiple of the linear functional Lı. With a fixed, f is a scalar multiple of 
L. Thus it is clear that f is a bilinear form on V. 


EXAMPLE 2. Let m and n be positive integers and F a field. Let V be 
the vector space of all m X n matrices over F. Let A be a fixed m X m 
matrix over F. Define 


fa(X, Y) = tr (X'AY). 


Then fa is a bilinear form on V. For, if X, Y, and Z are m X n matrices 
over F, 


fa(cX + Z, Y) = tr [(cX + Z)'AY] 

tr (CXtAY) + tr (Z'A Y) 

cfa(X, Y) + fa(Z, Y). 

Of course, we have used the fact that the transpose operation and the 
trace function are linear. It is even easier to show that fa is linear as a 


function of its second argument. In the special case n = 1, the matrix 
X'AY is 1 X 1, i.e., a scalar, and the bilinear form is simply 


fa(X, Y) = X'AY 
= DD Agxiy;. 
t J 


We shall presently show that every bilinear form on the space of m X 1 
matrices is of this type, i.e., is fa for some m X m matrix A. 
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EXAMPLE 3. Let F be a field. Let us find all bilinear forms on the 
space F?. Suppose f is such a bilinear form. If a = (zı, 22) and B = (Yı, Y2) 
are vectors in F?, then 

f(a, B) = f(t + x22, B) 
nfla, B) + Tf (€z B) 
aif (er, yrer + Yz) + ref (er, Yrer + Y22) 
= nyifla, a) + Tyf len €) + Tyf (er, er) + T2yY2f (e2, €). 
Thus f is completely determined by the four scalars Ai; = flex; €;) by 


f(a, g) = Antiy: + Auty: + Anry + A2242 
= 2 Aijt 
ud 


If X and Y are the coordinate matrices of a and £, and if A is the 2 X 2 
matrix with entries A(i, j) = Ai; = f(e; e), then 


(10-2) fla, B) = X'AY. 


We observed in Example 2 that if A is any 2 X 2 matrix over F, then 
(10-2) defines a bilinear form on F?. We see that the bilinear forms on F? 
are precisely those obtained from a 2 X 2 matrix as in (10-2). 

The discussion in Example 3 can be generalized so as to describe all 
bilinear forms on a finite-dimensional vector space. Let V be a finite- 
dimensional vector space over the field F and let ® = {a1,...,an} be 
an ordered basis for V. Suppose f is a bilinear form on V. If 


Qa = 2a, + Ea + Tran and B= Yar + SFA + YnOn 
are vectors in V, then 
f(a, 6) = f (= Tiati; B) 
= 2 Tif (ai; B) 
=2 zif (a 2 vias) 
= E È riyf lai aj). 
t J 
If we let A; = f(a:, a;), then 
f(a, B) = DD Airy; 
aj 
= X'AY 


where X and Y are the coordinate matrices of a and 8 in the ordered 
basis ®. Thus every bilinear form on V is of the type 


(10-3) F(a, 8) = [a]eA [B]e 


for some n X n matrix A over F. Conversely, if we are given any n X n 
matrix A, it is easy to see that (10-3) defines a bilinear form f on V, such 
that Ay = flai aj). 
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Definition. Let V be a finite-dimensional vector space, and let 
G = {a,..., an} be an ordered basis for V. If f is a bilinear form on V, 
the matrix of f in the ordered basis @ is the n X n matrix A with entries 
Ay = f(ai, aj). At times, we shall denote this matrix by [f]e. 


Theorem 1. Let V be a finite-dimensional vector space over the field F. 
For each ordered basis @ of V, the function which associates with each bilinear 
form on V tts matrix in the ordered basis @ is an isomorphism of the space 
L(V, V, F) onto the space of n X n matrices over the field F. 


Proof. We observed above that f— [f]g is a one-one corre- 
spondence between the set of bilinear forms on V and the set of all n X n 
matrices over F. That this is a linear transformation is easy to see, because 


(ef + g)(ais aj) = flai ai) + glai a) 
for each 7 and 7. This simply says that 
[ef + gle = c[f]e + [gle f 
Corollary. If @ = {a,...,an} ts an ordered basis for V, and 
@* = {In,..., La} ts the dual basis for V*, then the n? bilinear forms 
fula, 8) = Li(a)Lj@), I<is<nilsj<n 


form a basis for the space L(V, V, F). In particular, the dimension of 
L(V, V, F) zs n?. 


Proof. The dual basis {L, .. ., Ln} is essentially defined by the 
fact that L.(e) is the ith coordinate of «æ in the ordered basis @ (for any 
ain V). Now the functions f;; defined by 

fala, B) = La) LB) 
are bilinear forms of the type considered in Example 1. If 
a = ti F -e + Enan and B = ya +++ + Yran, 
then 
Fila, B) = iy; 
Let f be any bilinear form on V and let A be the matrix of f in the 
ordered basis ®. Then 


F(a, B) = È Atiy; 
which simply says that e 
J= 2 Aaf ij. 
It is now clear that the n? forms f;; comprise a basis for L(V, V, F). § 


One can rephrase the proof of the corollary as follows. The bilinear 
form f;; has as its matrix in the ordered basis @ the matrix ‘unit Bèi, 
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whose only non-zero entry is a 1 in row 7 and column j. Since these matrix 
units comprise a basis for the space of n X n matrices, the forms f;; com- 
prise a basis for the space of bilinear forms. 

The concept of the matrix of a bilinear form in an ordered basis is 
similar to that of the matrix of a linear operator in an ordered basis. Just 
as for linear operators, we shall be interested in what happens to the 
matrix representing a bilinear form, as we change from one ordered basis 
to another. So, suppose @ = f{a1,..., an} and @’ = {ai,..., an} are 
two ordered bases for V and that f is a bilinear form on V. How are the 
matrices [f]g and [fhe related? Well, let P be the (invertible) n X n 
matrix such that 

lala = Plale: 


for all ain V. In other words, define P by 
CA = Š Pija. 
i=l 


For any vectors a, Bin V 


f(a, 8) = [ale[flelBle 
= (Plale)'[fleP [B]e 
= [a]a(P'[f]aP) [b]. 


By the definition and uniqueness of the matrix representing f in the 
ordered basis @’, we must have 


(10-4) (fle = P'[f]eP. 


Examp.e 4, Let V be the vector space R?. Let f be the bilinear form 
defined on æ = (tı, 22) and 8 = (Yı, Y2) by 


f(a, B) = tyr + T1Y2 + toy + x42. 


foo toot JE] 


and so the matrix of f in the standard ordered basis ® = {e1, e} is 


tfle=[4 7} 


Let ®’ = {e1, &} be the ordered basis defined by «1 = (1, —1), & = (1, 1). 
In this case, the matrix P which changes coordinates from @’ to @ is 


Peli a 
[fle = P'UfleP 


sh Jh aJl- al 


Now 


Thus 
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What this means is that if we express the vectors a and 8 by means of 
their coordinates in the basis @’, say 
a = vier + 2262, B = yia + ye 
then 
fla, B) = 4xzyz 


One consequence of the change of basis formula (10-4) is the following: 
If A and B are n X n matrices which represent the same bilinear form 
on V in (possibly) different ordered bases, then A and B have the same 
rank. For, if P is an invertible n X n matrix and B = P*tAP, it is evident 
that A and B have the same rank. This makes it possible to define the 
rank of a bilinear form on V as the rank of any matrix which represents 
the form in an ordered basis for V. 

It is desirable to give a more intrinsic definition of the rank of a 
bilinear form. This can be done as follows: Suppose f is a bilinear form 
on the vector space V. If we fix a vector a in V, then f(a, 8) is linear as 
a function of 8. In this way, each fixed a determines a linear functional 
on V; let us denote this linear functional by Ls(«æ). To repeat, if a is a 
vector in V, then Z;(a) is the linear functional on V whose value on any 
vector £ is f(a, 8). This gives us a transformation a — L;(a) from V into 
the dual space V*. Since 


fcar + a2, 8) = f(a, B) + flan, B) 


we see that 
L,(cay + a) = cL ;(a1) + L;(ag) 


that is, Ly is a linear transformation from V into V*. 

In a similar manner, f determines a linear transformation R; from V 
into V*. For each fixed £ in V, f(a, 8) is linear as a function of a, We define 
R,(B) to be the linear functional on V whose value on the vector a is f(a, 6). 


Theorem 2. Let f be a bilinear form on the finite-dimensional vector 
space V. Let L: and R; be the linear transformations from V into V* defined 
by (Lia)(8) = f(a, 8) = (Ri8)(a). Then rank (Ly) = rank (Ry). 


Proof. One can give a ‘coordinate free’ proof of this theorem. 

Such a proof is similar to the proof (in Section 3.7) that the row-rank of a 

matrix is equal to its column-rank. So, here we shall give a proof which 

proceeds by choosing a coordinate system (basis) and then using the 
‘row-rank equals column-rank’ theorem. 

To prove rank (L;) = rank (Ry), it will suffice to prove that L; and 
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R; have the same nullity. Let @ be an ordered basis for V, and let A = [f]a. 
If a and 8 are vectors in V, with coordinate matrices X and Y in the 
ordered basis @, then f(a, 8) = X'‘AY. Now R,(g) = 0 means that 
f(a, 8) = Oforevery ain V, i.e., that X‘AY = Oforeveryn X 1 matrix X. 
The latter condition simply says that AY = 0. The nullity of R; is there- 
fore equal to the dimension of the space of solutions of AY = 0. 

Similarly, La) = Oif and only if X‘AY = 0 for every n X 1 matrix 
Y. Thus a is in the null space of L; if and only if X‘A = 0, i.e., A'X = 0. 
The nullity of L; is therefore equal to the dimension of the space of solu- 
tions of A‘X = 0. Since the matrices A and A‘ have the same column- 
rank, we see that 


nullity (L;) = nullity (R). § 


Definition. If f is a bilinear form on the finite-dimensional space V, 
the rank of f is the integer r = rank (Lr) = rank (Rg). 


Corollary 1. The rank of a bilinear form is equal to the rank of the 
matrix of the form in any ordered basis. 


Corollary 2. If f is a bilinear form on the n-dimensional vector space 
V, the following are equivalent: 


(a) rank (f) =n. 
(b) For each non-zero ain V, there is a B in V such that f(a, B) # 0. 
(c) For each non-zero B in V, there is an a in V such that f(a, B) = 0. 


Proof. Statement (b) simply says that the null space of Ly is the 
zero subspace. Statement (c) says that the null space of R; is the zero 
subspace. The linear transformations L; and Ry have nullity 0 if and only 
if they have rank n, i.e., if and only if rank (f) =n. § 


Definition. A bilinear form f on a vector space V is called non- 
degenerate (or non-singular) if it satisfies conditions (b) and (c) of 
Corollary 2. 


If V is finite-dimensional, then f is non-degenerate provided f satisfies 
any one of the three conditions of Corollary 2. In particular, f is non- 
degenerate (non-singular) if and only if its matrix in some (every) ordered 
basis for V is a non-singular matrix. 


Examp.e 5. Let V = R”, and let f be the bilinear form defined on 
a = (m,...,2n) and B = (y1,..-) Yn) by 


fla; B) = yı +e F TrYn 
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Then f is a non-degenerate bilinear form on R* The matrix of f in the 
standard ordered basis is the n X n identity matrix: 


f(X, Y) = AF, 
This f is usually called the dot (or scalar) product. The reader is probably 
familiar with this bilinear form, at least in the case n = 3. Geometrically, 
the number f(a, 8) is the product of the length of a, the length of 8, and 


the cosine of the angle between a and £. In particular, f(a, 8) = 0 if and 
only if the vectors a and 8 are orthogonal (perpendicular). 


Exercises 


1. Which of the following functions f, defined on vectors æ = (xı z2) and 8 = 
(Yı, Y2) in R*, are bilinear forms? 


(a) f(a, 8) = 1. 

(b) f(a, B) = (a1 — yi)? + ty 

(c) f(a, B) = (a1 + y)? — (a1 — y) 
(d) f(a, B) = TY: — Yi. 


2. Let f be the bilinear form on R? defined by 
S(Cay, yr), (22, Ya)) = T11 + ray. 
Find the matrix of f in each of the following bases: 
{0,0),,)}, {G,-1),(,D}, {(1, 2), (3, 4}. 


3. Let V be the space of all 2 X 3 matrices over R, and let f be the bilinear form 
on V defined by f(X, Y) = trace (X'AY), where 


Find the matrix of f in the ordered basis 
{E4, B! E», E2, E22, £23} 
where HE? is the matrix whose only non-zero entry is a 1 in row į and column j. 


4. Describe explicitly all bilinear forms f on R? with the property that f(a, 8) = 
t(8, æ) for all a, 6. 


5. Describe the bilinear forms on R$ which satisfy f(a, 8) = —f(6, a) for alla, B. 


6. Let n be a positive integer, and let V be the space of all n X n matrices over 
the field of complex numbers. Show that the equation 
f(A, B) = n tr (AB) — tr (A) tr (B) 
defines a bilinear form f on V. Is it true that f(A, B) = f(B, A) for all A, B? 
7. Let f be the bilinear form defined in Exercise 6. Show that f is degenerate 


(not non-degenerate). Let Vi be the subspace of V consisting of the matrices of 
trace 0, and let fı be the restriction of f to Vi. Show that fi is non-degenerate. 
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8. Let f be the bilinear form defined in Exercise 6, and let V2 be the subspace 
of V consisting of all matrices A such that trace (A) = 0 and A* = ~A (A®* is 
the conjugate transpose of 4). Benote by fz the restriction of f to Ve Show that 
fois negative definite, i.c., that fo(A, 4) < 0 for each non-zero A in Vo. 


9. Let f be the bilinear form defined in Exercise 6. Let W be the set of all matrices 
A in V such that f(A, B) = 0 for all B. Show that W is a subspace of V. Describe 
W explicitly and find its dimension. 


10. Let f be any bilinear form on a finite-dimensional vector space V. Let W be the 
subspace of all 8 such that f(@, 8) = 0 for every a. Show that 
rank f = dim V — dim W. 


Use this result and the result of Exercise 9 to compute the rank of the bilinear 
form defined in Exercise 6. 


11. Let f be a bilinear form on a finite-dimensional vector space V. Suppose V, 
is a subspace of V with the property that the restriction of f to V; is non-degenerate. 
Show that rank f > dim Vi. 


12. Let f, g be bilinear forms on a finite-dimensional vector space V. Suppose g 
is non-singular. Show that there exist unique linear operators Tı, T2 on V such that 


f(a, B) = g(Tie, B) = g(a, T28) 
for all a, 8. 


13, Show that the result given in Exercise 12 need not be true if g is singular. 


14. Let f be a bilinear form on a finite-dimensional vector space V. Show that f can 
be expressed as a product of two linear functionals (i.e., f(a, 8) = [n(a@)Lo(8) for 
In, Ly in V*) if and only if f has rank 1. 
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The main purpose of this section is to answer the following question: 
If f is a bilinear form on the finite-dimensional vector space V, when is 
there an ordered basis @ for V in which f is represented by a diagonal 
matrix? We prove that this is possible if and only if f is a symmetric 
bilinear form, i.e., f(a, 8) = f(8, a). The theorem is proved only when 
the scalar field has characteristic zero, that is, that if n is a positive integer 
the sum 1 + --- + 1 (n times) in F is not 0. 


Definition. Let f be a bilinear form on the vector space V. We say 
that f is symmetric if f(a, 8) = f(8, a) for all vectors a, B in V. 


If V is a finite-dimensional, the bilinear form f is symmetric if and 
only if its matrix A in some (or every) ordered basis is symmetric, A‘ = A. 
To see this, one inquires when the bilinear form 


f(X, Y) = XAY 
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is symmetric. This happens if and only if X‘AY = Y'AX for all column 
matrices X and Y. Since X'‘AY isal X 1 matrix, we have X'‘AY = Y'A'X, 
Thus f is symmetric if and only if Y'A'X = Y'AX for all X, Y. Clearly 
this just means that A = At. In particular, one should note that if there 
is an ordered basis for V in which f is represented by a diagonal matrix, 
then f is symmetric, for any diagonal matrix is a symmetric matrix. 

If f is a symmetric bilinear form, the quadratic form associated 
with f is the function q from V into F defined by 


ga) = fla, a). 
If F is a subfield of the complex numbers, the symmetric bilinear form f 


is completely determined by its associated quadratic form, according to 
the polarization identity 


(10-5) f(a, B) = igla + 8) — igla — B). 
The establishment of (10-5) is a routine computation, which we omit. If 
f is the bilinear form of Example 5, the dot product, the associated quad- 
ratic form is 

an.. tn) = Tit +s H r 
In other words, g(a) is the square of the length of a. For the bilinear form 
fa(X, Y) = X'AY, the associated quadratic form is 


qalX) = XtAX = D A siti). 
tJ 


One important class of symmetric bilinear forms consists of the inner 
products on real vector spaces, discussed in Chapter 8. If V is a real 
vector space, an inner product on V is a symmetric bilinear form f on 
V which satisfies 


(10-6) fiaa) >0 if a#0. 


A bilinear form satisfying (10-6) is called positive definite. Thus, an 
inner product on a real vector space is a positive definite, symmetric 
bilinear form on that space. Note that an inner product is non-degenerate. 
Two vectors a, 8 are called orthogonal with respect to the inner product f 
if f(a, 8) = 0. The quadratic form g(a) = f(a, a) takes only non-negative 
values, and g(a) is usually thought of as the square of the length of a. Of 
course, these concepts of length and orthogonality stem from the most 
important example of an inner product—the dot product of Example 5. 

If f is any symmetric bilinear form on a vector space V, it is con- 
venient to apply some of the terminology of inner products to f. It is 
especially convenient to say that a and 8 are orthogonal with respect to 
f if f(a, 8) = 0. It is not advisable to think of f(a, æ) as the square of the 
length of a; for example, if V is a complex vector space, we may have 
f(a, a) = V~1, or on a real vector space, f(a, a) = —2. 

We turn now to the basic theorem of this section. In reading the 
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proof, the reader should find it helpful to think of the special case in 
which V is a real vector space and f is an inner product on V. 


Theorem 3. Let V be a fintte-dimensional vector space over a field 
of characteristic zero, and let f be a symmetric bilinear form on V. Then there 
is an ordered basis for V in which f is represented by a diagonal matriz. 


Proof. What we must find is an ordered basis 
Q = (Onn oa an} 


such that fla; a;) = 0 for i #7. If f = 0 or n = 1, the theorem is obvi- 
ously true. Thus we may suppose f #0 and n > 1. If fla, œa) = 0 for 
every a in V, the associated quadratic form g is identically 0, and the 
polarization identity (10-5) shows that f = 0. Thus there is a vector @ in 
V such that f(a, a) = g(a) = 0. Let W be the one-dimensional subspace 
of V which is spanned by a, and let W* be the set of all vectors 8 in V 
such that f(a, 8) = 0. Now we claim that V = W@W“. Certainly the 
subspaces W and W+ are independent. A typical vector in W is ca, where c is 
a scalar. If ca is also in W+, thenf(ca, ca) = cfle, a) = 0. But f(a, a) Æ 0, 
thus c = 0. Also, each vector in V is the sum of a vector in W and a vector 
in W+. For, let y be any vector in V, and put 


Then 

fla, 8) oz f(a, y) = FES fla, a) 
and since f is symmetric, f(a, 8) = 0. Thus £ is in the subspace W+. The 
expression 


ns fy, a) 
TS foray 
shows us that V = W + W+. 
The restriction of f to W+ is a symmetric bilinear form on W+. Since 
W* has dimension (n — 1), we may assume by induction that W- has a 
basis {ay,..., an} such that 


Jle: a) = 0, i Æ j(i 2 2,7 & 2): 
Putting oi. = a, we obtain a basis {a1,..., an} for V such that f(a, a;) = 0 


fori ¥j. J 


Corollary. Let F be a subfield of the complex numbers, and let A be a 
symmetric n X n matrix over F. Then there is an invertible n X n matrix 
P over F such that P'AP is diagonal. 


In case F is the field of real numbers, the invertible matrix P in this 
corollary can be chosen to be an orthogonal matrix, i.e, P! = P~. In 
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other words, if A is a real symmetric n X n matrix, there is a real or- 
thogonal matrix P such that P‘4P is diagonal; however, this is not at all 
apparent from what we did above (see Chapter 8). 


Theorem 4, Let V be a finite-dimensional vector space over the field of 
complex numbers. Let f be a symmetric bilinear form on V which has rank r. 
Then there is an ordered basis ® = {@1,..., Ba} for V such that 


(i) the matrix of f in the ordered basis @ is diagonal; 
ji 1, 
ci) Ne e = {9 


Proof. By Theorem 3, there is an ordered basis {ai,..., an} 
for V such that 


j=1,...,r 
j>r. 


flai a;) =0 for i# J. 
Since f has rank 7, so does its matrix in the ordered basis {ai,..., aa}. 


Thus we must have f(a;, aj) ¥ 0 for precisely r values of j. By reordering 
the vectors a;, we may assume that 


flapa) #0, gHl,...,7. 
Now we use the fact that the scalar field is the field of complex numbers. 





If Vf(a;, aj) denotes any complex square root of f(a;, aj), and if we put 


1 : 
b; = |e aj) A 
P j>r 
the basis {8n . . - , Ba} satisfies conditions (i) and (ii). fj 


Of course, Theorem 4 is valid if the scalar field is any subfield of the 
complex numbers in which each element has a square root. It is not valid, 
for example, when the scalar field is the field of real numbers. Over the 
field of real numbers, we have the following substitute for Theorem 4. 


Theorem 5. Let V be an n-dimensional vector space over the field of 
real numbers, and let f be a symmetric bilinear form on V which has rank r. 


Then there is an ordered basis {@, Bz, . . . , Bat for V in which the matrix of 
f is diagonal and such that 
f(6, 6) = +1, j=1,...,r 


Furthermore, the number of basis vectors B; for which f(6;, B) = 1 is inde- 
pendent of the choice of basis. 


Proof. There is a basis {m,..., an} for V such that 
flai, aj) = 0, tj 
flai a;) #0, l<j<r 
fla, aj) = 0, j> r. 
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Let 
Bi = |Ian a) an Lsgsr 
B; = dj, J >r. 

Then {6;,..., Ba} is a basis with the stated properties. 

Let p be the number of basis vectors 8; for which f(8;, 8;) = 1; we 
must show that the number p is independent of the particular basis we 
have, satisfying the stated conditions. Let V+ be the subspace of V 
spanned by the basis vectors £; for which f(8;, 8;) = 1, and let V~ be the 
subspace spanned by the basis vectors 8; for which f(8;, B) = —1. Now 
p = dim V+, so it is the uniqueness of the dimension of V+ which we 
must demonstrate. It is easy to see that if a is a non-zero vector in Vt, 
then f(a, œ) > 0; in other words, f is positive definite on the subspace V+. 
Similarly, if a is a non-zero vector in V-, then f(a, a) < 0, i.e., f is negative 
definite on the subspace V~. Now let V+ be the subspace spanned by the 
basis vectors 8; for which f(G;, 8;) = 0. If a is in V+, then f(a, B) = 0 for 
all 8 in V. 

Since {@,..., Ba} is a basis for V, we have 


V=V+@V-OMF.. 
Furthermore, we claim that if W is any subspace of V on which f is posi- 
tive definite, then the subspaces W, V~, and V+ are independent. For, 
suppose a isin W, Bisin V-, yis in V+, and œa + 8 + y = 0. Then 
0 = fla, a +B + 7) = fla, a) + fla, B) + fla, Y) 
0 = f(B,a +B + y) = fB, a) + f6, B) + FB Y). 
Since yisin V+, f(a, y) = f(8, y) = 0; and since f is symmetric, we obtain 
0 = fla, a) + fla, B) 
0 = f(8, B) + f(a, B) 
hence f(a, a) = f(8, 8). Since f(a, a) > 0 and f(6, 8) < 0, it follows that 


f(a, a) = f(8, 8) = 0. 
But f is positive definite on W and negative definite on V-. We conclude 
that a = 6 = 0, and hence that y = 0 as well. 

Since 

V= VOV- AV 
and W, V-, V+ are independent, we see that dim W < dim V+. That is, 
if W is any subspace of V on which f is positive definite, the dimension 
of W cannot exceed the dimension of V+. If @, is another ordered basis 
for V which satisfies the conditions of the theorem, we shall have corre- 
sponding subspaces Vi, Vr, and Vi; and, the argument above shows 
that dim Vit < dim V+. Reversing the argument, we obtain dim V+ < 
dim Vř, and consequently 

dim V+ = dim Vi. J 


371 


872 


Bilinear Forms Chap. 10 


There are several comments we should make about the basis 
{Bu - <. , Bn} of Theorem 5 and the associated subspaces V+, V-, and V+ 
First, note that V+ is exactly the subspace of vectors which are ‘orthogonal’ 
to all of V. We noted above that V+ is contained in this subspace; but, 


dim V+ = dim V — (dim V+ + dim V-) = dim V — rank f 


so every vector a such that f(a, 8) = 0 for all 8 must be in V+. Thus, the 
subspace V+ is unique. The subspaces V+ and V- are not unique; however, 
their dimensions are unique. The proof of Theorem 5 shows us that dim 
V+ is the largest possible dimension of any subspace on which f is positive 
definite. Similarly, dim V- is the largest dimension of any subspace on 
which f is negative definite. Of course 


dim V+ + dim V- = rankf. 
The number 
dim V+ — dim V- 


is often called the signature of f. It is introduced because the dimensions 
of V+ and V- are easily determined from the rank of f and the signature 
of f. 

Perhaps we should make one final comment about the relation of 
symmetric bilinear forms on real vector spaces to inner products. Suppose 
V is a finite-dimensional] real vector space and that Vi, V2, Vs are sub- 
spaces of V such that 


Y = V1 @B Ve D V3. 
Suppose that fı is an inner product on Vi, and fz is an inner product on V2. 
We can then define a symmetric bilinear form f on V as follows: If a, B 
are vectors in V, then we can write 
a=a tata; and 8 = Bi + Be + Bs 
with a; and £; in V;. Let 
f(a, B) = filas B1) — fela, B2). 


The subspace V+ for f will be V3, Vı is a suitable V+ for f, and V2 is a 
suitable V-. One part of the statement of Theorem 5 is that every sym- 
metric bilinear form on V arises in this way. The additional content of 
the theorem is that an inner product is represented in some ordered basis 
by the identity matrix. 


Exercises 


1. The following expressions define quadratic forms g on R?. Find the symmetric 
bilinear form f corresponding to each q. 
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(a) arī. (e) rì + 923. 
(b) baixe. (f) 3x22 — 23. 
(c) cx. (g) 40? + 62it2 — 322. 


(d) 2x} — irzo. 


2. Find the matrix, in the standard ordered basis, and the rank of each of the 
bilinear forms determined in Exercise 1. Indicate which forms are non-degenerate. 


3. Let g(x, x2) = ari + bzzz: + cx} be the quadratic form associated with a 
symmetric bilinear form f on R?. Show that f is non-degenerate if and only if 
b? — 4ac ¥ 0. 


4. Let V be a finite-dimensional vector space over a subfield F of the complex 
numbers, and let S be the set of all symmetric bilinear forms on V. 


(a) Show that S is a subspace of L(V, V, F). 
(b) Find dim S. 


Let Q be the set of all quadratic forms on V. 


(c) Show that Q is a subspace of the space of all functions from V into F. 

(d) Describe explicitly an isomorphism T of Q onto S, without reference to 
a basis. 

(e) Let U be a linear operator on V and q an element of Q. Show that the 
equation (Utg)(@) = q(Ua@) defines a quadratic form Utg on V. 

(f) If U is a linear operator on V, show that the function Ut defined in part 
(e) is a linear operator on Q. Show that Ut is invertible if and only if U is invertible. 


5. Let g be the quadratic form on R? given by 
qlz, £2) = ax? + 2baize + cz2, a #0. 


Find an invertible linear operator U on R? such that 


(Uta) (2u 29) = ast + (0 — 3 2. 
(Hint: To find U— (and hence U), complete the square. For the definition of Ut, 
see part (e) of Exercise 4.) 
6. Let g be the quadratic form on R? given by 
qlz, £2) = Wax. 
Find an invertible linear operator U on R? such that 
(Utg) (a1, £2) = 2bx? — 2b72. 
7. Let g be the quadratic form on R? given by 
Q(X, To, 3) = Lita + Qaix3 + 23. 
Find an invertible linear operator U on R? such that 
(Uta) (a1, Ta, T3) = 2} — 23 + 23. 


(Hint: Express U as a product of operators similar to those used in Exercises 5 
and 6.) 
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8. Let A be a symmetrie n X n matrix over R, and let q be the quadratic form 
on R* given by 
glt, 2.3 Tn) = È Aijtiz;. 
a? 


Generalize the method used in Exercise 7 to show that there is an invertible linear 
operator U on R” such that 


nr 
(UINE -or En) = DY ese? 
i=l 
where c; is 1, —1, or 0,7 = 1,..., n. 
9. Let f be a symmetric bilinear form on R”. Use the result of Exercise 8 to prove 
the existence of ah ordered basis ® such that [f]e is diagonal. 
10. Let V be the real vector space of all 2 X 2 (complex) Hermitian matrices, 
that is, 2 X 2 complex matrices A which satisfy Ai = Ap 


(a) Show that the equation g(A) = det A defines a quadratic form ¢ on V. 
(b) Let W be the subspace of V of matrices of trace 0. Show that the bilinear 
form f determined by q ix negative definite on the subspace W. 


11. Let V be a finite-dimensional vector space and f a non-degenerate symmetric 
bilinear form on V. Show that for each linear operator T on V there is a unique 
linear operator T’ on V such that f(Ta, 8) = f(a, T’B) for all a, B in V. Also 
show that 


(T1T2)’ = TSH; 
(aTi + eT)! = alt + aT 
(T') = T. 


How much of the above is valid without the assumption that T is non-degenerate? 


12. Let F be a field and V the space of n X 1 matrices over F. Suppose A is a 
fixed n X n matrix over F and f is the bilinear form on V defined by f(X, Y) = 
X'‘AY. Suppose f is symmetric and non-degenerate. Let B be an n X n matrix 
over F and T the linear operator on V sending X into BX. Find the operator T’ 
of Exercise 11. 

13. Let V be a finite-dimensional vector space and f a non-degenerate symmetric 
bilinear form on V. Associated with f is a ‘natural’ isomorphism of V onto the 
dual space V*, this isomorphism being the transformation Ly of Section 10.1. 
Using Ls, show that for each basis ® = {a1,...,@n} of V there exists a unique 
basis @' = {ai,...,a/} of V such that fla; aj) = 6;;. Then show that for every 
vector a in V we have 


a= Z fla, aijai = D flai, aaj. 


14. Let V, f, @, and @’ be as in Exercise 13. Suppose T is a linear operator on V 
and that T’ is the operator which f associates with T as in Exercise 11. Show that 


(a) [Tle = [T]. 
(b) tr (T) = tr (Th) = E f(Tai, a). 


15. Let V, f, &, and @’ be as in Exercise 13. Suppose [f]e = A. Show that 
a; = D (Aua; = B (AM) roe. 
3 I 
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16. Let F be a field and V the space of n X 1 matrices over F. Suppose A is an 
invertible, symmetric n X n matrix over F and that f is the bilinear form on V 
defined by f(X, Y) = X‘AY. Let P be an invertible n X n matrix over F and B 
the basis for V consisting of the columns of P. Show that the basis @’ of Exercise 13 
consists of the columns of the matrix ATPL, 


17. Let V be a finite-dimensional vector space over a field F and f a symmetric 
bilinear form on V. For each subspace W of V, let W+ be the set of all vectors a 
in V such that f(a, 8) = 0 for every B in W. Show that 


(a) W+ is a subspace. 

(b) V = {0}. 

(c) V+ = {0} if and only if f is non-degenerate. 

(d) rank f = dim V — dim V+. 

(e) If dim V =n and dim W = m, then dim W+ > n— m. (Hint: Let 
{6i,..., Bm} be a basis of W and consider the mapping 


Qa > (fla, Bi), soe , f(a, Bm)) 
of V into F*,) 
(f) The restriction of f to W is non-degenerate if and only if 
WO Wt = {0}. 
(g) V = W @W? if and only if the restriction of f to W is non-degenerate. 
18. Let V be a finite-dimensional vector space over C and f a non-degenerate 


symmetric bilinear form on V. Prove that there is a basis @ of V such that @’ = B. 
(Se: Exercise 13 for a definition of @’.) 


10.3. Skew-Symmetric Bilinear Forms 


Throughout this section V will be a vector space over a subfield F 
of the field of complex numbers. A bilinear form f on V is called skew- 
symmetric if f(a, 8) = —f(8, a) for all vectors a, 8 in V. We shall prove 
one theorem concerning the simplification of the matrix of a skew- 
symmetric bilinear form on a finite-dimensional space V. First, let us 
make some general observations. 

Suppose f is any bilinear form on V. If we let 


g(a, B) = 3[f(a, B) + f(B, a)] 
h(a, 8) = 3[fla, 8) — f(6, a)] 
then it is easy to verify that g is a symmetric bilinear form on V and h is 
a skew-symmetric bilinear form on V. Also f = g + h. Furthermore, this 
expression for V as the sum of a symmetric and a skew-symmetric form 
is unique. Thus, the space L(V, V, F) is the direct sum of the subspace 
of symmetric forms and the subspace of skew-symmetric forms. 
If V is finite-dimensional, the bilinear form f is skew-symmetric if 
and only if its matrix A in some (or every) ordered basis is skew-symmetric, 
A't = —A. This is proved just as one proves the corresponding fact about 
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symmetric bilinear forms. When f is skew-symmetric, the matrix of f in 
any ordered basis will have all its diagonal entries 0. This just corresponds 
to the observation that f(a, a) = 0 for every a in V, since f(a, a) = 
— f(a, a). 

Let us suppose f is a non-zero skew-symmetric bilinear form on V. 
Since f + 0, there are vectors a, 8 in V such that f(a, 8) = 0. Multiplying 
a by asuitable scalar, we may assume thatf(a, 8) = 1. Let y be any vector 
in the subspace spanned by a and 8, say y = ca + d@. Then 

f(y, a) = f(ca + dB, a) = df(8, a) = —d 
f(y, B) = f(ca + dB, 8) = ef(a,8) = c 


and so 


(10-7) y = fly, Bla — fly, a)B. 
In particular, note that a and £ are necessarily linearly independent; for, 
ify = 0, then f(y, a) = f, B) =0. 

Let W be the two-dimensional subspace spanned by a and 8. Let W+ 
be the set of all vectors 6 in V such that f(ô, a) = f(6, 8) = 0, that is, the 
set of all 6 such that f(6, y) = 0 for every y in the subspace W. We claim 
that V = W@ W+. For, let e be any vector in V, and 


Y= fle, B)a a fle, a)B 
=e-y. 
Then y is in W, and 6 isin W+, for 
F(6, a) = fle — fle B)a + fle a)B, a) 
= f(e, a) + fle a)f(B, a) 
=0 


and similarly f(ô, 8) = 0. Thus every e in V is of the form «e = y + ô, 
with y in W and 6 in W+, From (9-7) it is clear that W M W+ = {0}, and 
so V = W Q W+. 

Now the restriction of f to W+ is a skew-symmetric bilinear form on 
W+. This restriction may be the zero form. If it is not, there are vectors 
a’ and 8'in W+ such that f(a’, B’) = 1. If we let W’ be the two-dimensional 
subspace spanned by a’ and $’, then we shall have 


V=wWOw em 


where Wy is the set of all vectors 6 in W+ such that f(a’, 5) = f(6’, 6) = 0. 
If the restriction of f to Wo is not the zero form, we may select vectors 
a’’, B” in Wo such that f(@’’, B”) = 1, and continue. 

In the finite-dimensional case it should be clear that we obtain a 
finite sequence of pairs of vectors, 


(œ, Bi), (az, Bo), aay (ak, Bx) 


with the following properties: 
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(a) f(a; b) = 1,7 =1,. ik. 
(b) flai, aj) = SBa B;) = fas B;) 7 0, a Aj. 


(c) If W; isthe two-dimensional subspace spanned by a; and £;, then 


V= W- OW. D Wo 


where every vector in Wo is ‘orthogonal’ to all a; and B,, and the restric- 
tion of f to Wo is the zero form. 


Theorem 6. Let V be an n-dimensional vector space over a subfield of 
the complex numbers, and let f be a skew-symmetric bilinear form on V. Then 
the rank r of f is even, and if r = 2k there is an ordered basis for V in which 
the matrix of f is the direct sum of the (n — r) X (n — r) zero matrix and 
k coptes of the 2 X 2 matrix 

| 0 of 
~1 0 


Proof. Let ou, Bi, ... Qk, Bk be vectors satisfying conditions (a), 
(b), and (c) above. Let {yı, . . . , Ys} be any ordered basis for the subspace 
Wo. Then 
G = {an Br, 2, Bay...» Oey Bay Vy +.» > Yeh 


is an ordered basis for V. From (a), (b), and (c) it is clear that the matrix 
of f in the ordered basis @ is the direct sum of the (n — 2k) X (n — 2k) 
zero matrix and k copies of the 2 X 2 matrix 


10.8) Ee 


Furthermore, it is clear that the rank of this matrix, and hence the rank 
off, is 2k. I 


One consequence of the above is that if f is a non-degenerate, skew- 
symmetric bilinear form on V, then the dimension of V must be even. If 


dim V = 2k, there will be an ordered basis {æn B1 . . . , @x Bx} for V such 
that 
=, JO, i#j 
Sila; Bi) a a i =j 


f(a; oj) = flBa Bi) = 0. 


The matrix of f in this ordcred basis is the direct sum of k copies of the 
2 X 2 skew-symmetric matrix (10-8). We obtain another standard form 
for the matrix of a non-degenerate skew-symmetric form if, instead of the 
ordered basis above, we consider the ordered basis 


{ær oy ky Buy. + + Bi}. 


377 


878 


Bilinear Forms Chap. 10 


The reader should find it easy to verify that the matrix of f in the latter 
ordered basis has the block form 


Lo] 


where J is the k X k matrix 


0 01 
0 1 0} 
i 0 0 


Exercises 


1, Let V be a vector space over a field F. Show that the set of all skew-symmetric 
bilinear forms on V is a subspace of L(V, V, F). 


2. Find all skew-symmetric bilinear forms on R3. 

3. Find a basis for the space of all skew-symmetric bilinear forms on R”. 

4. Let f be a symmetric bilinear form on C” and g a skew-symmetric bilinear 
form on C”, Suppose f + g = 0. Show that f = g = 0. 


5. Let V be an n-dimensional vector space over a subfield F of C. Prove the 
following. 


(a) The equation (Pf)(a, 8) = $f(a, B) — 3f(8, œ) defines a linear operator P 
on L(V, V, F). 
(b) P? = P, i.e., P is a projection. 
n(n + 1), 


(c) rank P = men nullity P = 5 
(d) If U is a linear operato on V, the equation (Utf)(a, 8) = f(Ua, UB) 


2 
defines a linear operator Ut on L(V, V, F). 
(e) For every linear operator U, the projection P commutes with Ut, 


6. Prove an analogue of Exercise 11 in Section 10.2 for non-degenerate, skew- 
symmetric bilinear forms. 


7. Let f be a bilinear form on a vector space V. Let L; and R; be the mappings of 
V into V* associated with f in Section 10.1. Prove that f is skew-symmetric if and 
only if L; = —R,y. 

8. Prove an analogue of Exercise 17 in Section 10.2 for skew-symmetric forms. 

9. Let V be a finite-dimensional vector space and Lı, Lẹ linear functionals on V. 
Show that the equation 

f(a, B) = L1(@)L2(B) — Ly(8) La) 
defines a skew-symmetric bilinear form on V. Show that f = 0 if and only if Ly, Le 
are linearly dependent. 


10. Let V be a finite-dimensional vector space over a subfield of the complex 
numbers and f a skew-symmetric bilinear form on V. Show hat f has rank 2 if 
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and only if there exist linearly independent linear functionals Lı, L on V such that 


f(a, 8) = Li(a)L(8) — Ly(8)L(a). 


IL. Let f be any skew-symmetric bilinear form on Rè’. Prove that there are linear 
functionals Lı, La such that 


fla, B) = Ln(a)L2(8) — In(8)L2(a). 


12. Let V be a finite-dimensional vector space over a subfield of the complex 
numbers, and let f, g be skew-symmetric bilinear forms on V. Show that there is 
an invertible linear operator T on V such that f(Ta, T) = g(a, B) for all a, B 
if and only if f and g have the same rank. 


13. Show that the result of Exercise 12 is valid for symmetric bilinear forms on a 
complex vector space, but is not valid for symmetric bilinear forms on a real vector 
space. 
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Let f be a bilinear form on the vector space V, and let T be a linear 
operator on V. We say that T preserves f if f(Ta, TB) = f(a, 8) for 
alla, Bin V. For any T andf the function g, defined by g(a, 8) = f(Ta, TB), 
is easily seen to be a bilinear form on V. To say that T preserves f is simply 
to say g = f. The identity operator preserves every bilinear form. If S 
and T are linear operators which preserve f, the product ST also preserves 
f; for f(STa, STB) = f(Ta, TB) = f(a, B). In other words, the collection 
of linear operators which preserve a given bilinear form is closed under 
the formation of (operator) products. In general, one cannot say much 
more about this collection of operators; however, if f is non-degenerate, 
we have the following. 


Theorem 7. Let f be a non-degenerate bilinear form on a finite- 
dimensional vector space V. The set of all linear operators on V which preserve 
f is a group under the operation of composition. 


Proof. Let G be the set of linear operators preserving f. We 
observed that the identity operator is in G and that whenever S and T 
are in G the composition ST is also in G. From the fact that f is non- 
degenerate, we shall prove that any operator T in G is invertible, and 
T— is also in G. Suppose T preserves f. Let a be a vector in the null space 
of T. Then for any 8 in V we have 


F(a, B) = f(Ta, TB) = f(0, TB) = 0. 


Since f is non-degenerate, a = 0. Thus T is invertible. Clearly T~! also 
preserves f; for 


f(T“ a, T-18) = f(TT ~a, TT~18) = f(a, 8). D 
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If f is a non-degenerate bilinear form on the finite-dimensional space 
V, then each ordered basis @ for V determines a group of matrices 
‘preserving’ f. The set of all matrices [T]g, where T is a linear operator 
preserving f, will be a group under matrix multiplication. There is an 
alternative description of this group of matrices, as follows. Let A = [f]e, 
so that if a and £ are vectors in V with respective coordinate matrices X 
and Y relative to @, we shall have 


f(a, B) = X*AY. 
Let T be any linear operator on V and M = [T']g. Then 
f(Ta, T) = (MX)'A(MY) 
= X'(M'AM)Y. 
Accordingly, T preserves f if and only if M'AM = A. In matrix language 
then, Theorem 7 says the following: If A is an invertible n X n matrix, 
the set of all n X n matrices M such that M'AM = A is a group under 
matrix multiplication. If A = [f]e, then M is in this group of matrices if 
and only if M = [T]g, where T is a linear operator which preserves f. 
Before turning to some examples, let us make one further remark. 
Suppose f is a bilinear form which is symmetric. A linear operator T pre- 
serves f if and only if T preserves the quadratic form 


gla) = fla, a) 
associated with f. If T preserves f, we certainly have 
q(Ta) = f(Ta, Ta) = fla, a) = g(a) 
for every a in V. Conversely, since f is symmetric, the polarization identity 
F(a, B) = igla + 8) — igla — B) 
shows us that T preserves f provided that q(T) = q(y) for each y in V. 


(We are assuming here that the scalar field is a subfield of the complex 
numbers.) 


EXAMPLE 6. Let V be either the space R” or the space ©”. Let f be 
the bilinear form 


fla, 6) = È z 


where a = (t1, .. ., £n) and B = (Yı . . ., Yn). The group preserving f is 
called the n-dimensional (real or complex) orthogonal group. The 
name ‘orthogonal group’ is more commonly applied to the associated 
group of matrices in the standard ordered basis. Since the matrix of f 
in the standard basis is J, this group consists of the matrices M which 
satisfy M‘M = I. Such a matrix M is called an n X n (real or complex) 
orthogonal matrix. The two n X n orthogonal groups are usually de- 
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noted O(n, R) and O(n, C). Of course, the orthogonal group is also the 
group which preserves the quadratic form 


Qla., En) = ti t+: + 2X2. 


EXAMPLE 7. Let f be the symmetric bilinear form on R” with quad- 

ratic form 
Pp n 
Q(t1,...,%) = Da 2 z. 
j=1 j=p+1 

Then f is non-degenerate and has signature 2p — n. The group of ma- 
trices preserving a form of this type is called a pseudo-orthogonal group. 
When p = n, we obtain the orthogonal group O(n, R) as a particular type 
of pseudo-orthogonal group. For each of the n + 1 values p = 0, 1,2,..., 
n, we obtain different bilinear forms f; however, for p = k and p = n — k 
the forms are negatives of one another and hence have the same associated 
group. Thus, when n is odd, we have (n + 1)/2 pseudo-orthogonal groups 
of n X n matrices, and when n is even, we have (n + 2)/2 such groups. 


Theorem 8. Let V be an n-dimensional vector space over the field of 
complex numbers, and let f be a non-degenerate symmetric bilinear form on V. 
Then the group preserving f is isomorphic to the complex orthogonal group 


O(n, ©). 


Proof. Of course, by an isomorphism between two groups, we 
mean a one-one correspondence between their elements which ‘preserves’ 
the group operation. Let G be the group of linear operators on V which 
preserve the bilinear form f. Since f is both symmetric and non-degenerate, 
Theorem 4 tells us that there is an ordered basis @ for V in which f is 
represented by the n X n identity matrix. Therefore, a linear operator T 
preserves f if and only if its matrix in the ordered basis @ is a complex 


orthogonal matrix. Hence 
T > [Tle 


is an isomorphism of G onto O(n, C). P 


Theorem 9. Let V be an n-dimensional vector space over the field of 
real numbers, and let f be a non-degenerate symmetric bilinear form on V. 
Then the group preserving f is isomorphic to an n X n pseudo-orthogonal 
group. 

Proof. Repeat the proof of Theorem 8, using Theorem 5 instead 
of Theorem 4. § 


EXAMPLE 8. Let f be the symmetric bilinear form on R* with quad- 
ratic form 
glz, y,2z,) = È — r? — y? — z. 
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A linear operator T on R! which preserves this particular bilinear (or 
quadratic) form is called a Lorentz transformation, and the group pre- 
serving f is called the Lorentz group. We should like to give one method 
of describing some Lorentz transformations. 

Let H be the real vector space of all 2 X 2 complex matrices A which 
are Hermitian, A = A*. It is easy to verify that 


[itz y+ ] 
Hoyeo=[0 FE t-r 
defines an isomorphism ® of R? onto the space H. Under this isomorphism, 
the quadratic form q is carried onto the determinant function, that is 


_ ttr yt 
q(z, y, z, t) = det | | ie ys =| 
or 
qla) = det (a). 
This suggests that we might study Lorentz transformations on R! by 
studying linear operators on H which preserve determinants. 
Let M be any complex 2 X 2 matrix and for a Hermitian matrix A 
define 
Uy(A) = MAM*. 


Now MAM* is also Hermitian. From this it is easy to see that Uy is a 
(real) linear operator on H. Let us ask when it is true that Um ‘preserves’ 
determinants, i.e. det [Uy(A)] = det A for each A in H. Since the 
determinant of M* is the complex conjugate of the determinant of M, 
we see that 

det [Un(A)] = |det M|? det A. 


Thus Uy preserves determinants exactly when det M has absolute value 1. 
So now let us select any 2 X 2complex matrix M for which 
|det M| = 1. Then Uy is a linear operator on H which preserves de- 
terminants. Define 
Tu = O10 y®. 
Since ® is an isomorphism, T m is a linear operator on Rt. Also, Tm is a 
Lorentz transformation; for 


g(Tua) = q- U uba) 
det (®6-'Uy,ba) 
= det (Uya) 
= det ($a) 
qla) 
and so Tm preserves the quadratic form q. 
By using specific 2 X 2 matrices M, one can use the method above 
to compute specific Lorentz transformations. There are two comments 
which we might make here; they are not difficult to verify. 
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(1) If Mı and M: are invertible 2 X 2 matrices with complex entries, 
then Uy, = Um, if and only if M» is a scalar multiple of Mı. Thus, all of 
the Lorentz transformations exhibited above are obtainable from uni- 
modular matrices M, that is, from matrices M satisfying det M = 1. If 
Mı and M are unimodular matrices such that Mı ~ M: and Mı ~¥ —M,, 
then Ty, # Tm 

(2) Not every Lorentz transformation is obtainable by the above 
method. 


Exercises 
l. Let M be a member of the complex orthogonal group, O(n, C). Show that Mt, 
M, and M* = M' also belong to O(n, C). 


2. Suppose M belongs to O(n, C) and that M’ is similar to M. Does M’ also 
belong to O(n, C)? 


3. Let 
n 
y= E Muar 
k=1 
where M is a member of O(n, C). Show that 
Dy = Dz}. 
3 J 
4. Let M bean n X n matrix over C with columns Mı, Ma, . . ., Mn. Show that 
M belongs to O(n, C) if and only if 
MiM, = 83x. 


5. Let X be ann X 1 matrix over C. Under what conditions does O(n, C) contain 
a matrix M whose first column is X? 


6. Find a matrix in O(3, C) whose first row is (2i, 2i, 3). 


7. Let V be the space of all n X 1 matrices over C and f the bilinear form on V 
given by f(X, Y) = X'Y. Let M belong to O(n, C). What is the matrix of f in the 
basis of V consisting of the columns M,, My,..., Mn of M? 


8. Let X be ann X 1 matrix over C such that X‘X = 1, and J; be the jth column 
of the identity matrix. Show there is a matrix M in O(n, C) such that MX = Jj. 
If X has real entries, show there is an M in O(n, R) with the property that MX = I;. 


9. Let V be the space of all n X 1 matrices over C, A an n X n matrix over ©, 
and f the bilinear form on V given by f(X, Y) = X'AY. Show that f is invariant 
under O(n, C), i.e., (MX, MY) = f(X, Y) for all X, Y in V and M in O(n, C), 
if and only if A commutes with each member of O(n, C). 


10. Let S be any set of n X n matrices over C and S’ the set of all n X n matrices 
over C which commute with each element of S. Show that S’ is an algebra over C. 
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11. Let F be a subfield of C, V a finite-dimensional vector space over F, and f a 
non-singular bilinear form on V. If T is a linear operator on V preserving f, prove 
that det T = +1. 


12. Let F bea subfield of C, V the space of n X 1 matrices over F, A an invertible 
n X n matrix over F, and f the bilinear form on V given by f(X, Y) = XAY. 
If M isan n X n matrix over F, show that M preserves f if and only if AMA = 
M~. 

13. Let g be a non-singular bilinear form on a finite-dimensional vector space V. 
Suppose T is an invertible linear operator on V and that f is the bilinear form 
on V given by f(a, #) = g(a, T8). If U is a linear operator on V, find necessary 
and sufficient conditions for U to preserve f. 


14, Let T be a linear operator on C? which preserves the quadratic form 2? — q3. 
Show that 

(a) det (T) = +1. 

(b) If M is the matrix of T in the standard basis, then My = +Mun, Ma = 
+M, M3, — Mi, = 1. 

(c) If det M = 1, then there is a non-zero complex number c such that 


eae free 

1 c c 
M = = . 
2 1 1 

c= ct- 

c c 


(d) If det M = —1 then there is a complex number c such that 


ee gal 
M=} : 
~ 2 


€ 

1 1 

Aara 
15. Let f be the bilinear form on C? defined by 


S((%1, £2), (Yrs Y2)) = TY — Tay. 
Show that 


(a) if T is a linear operator on C?, then f(Ta, TB) = (det T)f(a, 8) for all 
a, Bin C. 

(b) T preserves f if and only if det T = +1. 

(c) What does (b) say about the group of 2 X 2 matrices M such that 


M'AM = A where 
0 1 ‘ 
A= E J. 


16. Let n be a positive integer, Z the n X n identity matrix over C, and J the 
2n X 2n matrix given by 
J= 0 Z 
-L-70 


Let M be a 2n X 2n matrix over C of the form 


A B 
“= |o A 
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where A, B, C, D are n X n matrices over C. Find necessary and sufficient con- 
ditions on A, B, C, D in order that M‘JM = J. 

17. Find all bilinear forms on the space of n X 1 matrices over R which are in- 
variant under O(n, R). 

18, Find all bilinear forms on the space of n X 1 matrices over C which are in- 
variant under O(n, C). 
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Appendix 


This Appendix separates logically into two parts. The first part, 
comprising the first three sections, contains certain fundamental concepts 
which occur throughout the book (indeed, throughout mathematics). It 
is more in the nature of an introduction for the book than an appendix. 
The second part is more genuinely an appendix to the text. 

Section 1 contains a discussion of sets, their unions and intersections. 
Section 2 discusses the concept of function, and the related ideas of range, 
domain, inverse function, and the restriction of a function to a subset of 
its domain. Section 3 treats equivalence relations. The material in these 
three sections, especially that in Sections 1 and 2, is presented in a rather 
concise manner. It is treated more as an agreement upon terminology 
than as a detailed exposition. In a strict logical sense, this material con- 
stitutes a portion of the prerequisites for reading the book; however, the 
reader should not be discouraged if he does not completely grasp the 
significance of the ideas on his first reading. These ideas are important, 
but the reader who is not too familiar with them should find it easier to 
absorb them if he reviews the discussion from time to time while reading 
the text proper. 

Sections 4 and 5 deal with equivalence relations in the context of 
linear algebra. Section 4 contains a brief discussion of quotient spaces. 
It can be read at any time after the first two or three chapters of the book. 
Section 5 takes a look at some of the equivalence relations which arise in 
the book, attempting to indicate how some of the results in the book might 
be interpreted from the point of view of equivalence relations. Section 6 
describes the Axiom of choice and its implications for linear algebra. 


386 


Sec, A.l Sets 


A.l 


We shall use the words ‘set, ‘class, ‘collection, and ‘family’ inter- 
changeably, although we give preference to ‘set.’ If S is a set and zx is 
an object in the set S, we shall say that z is a member of S, that z is an 
element of S, that z belongs to S, or simply that x is in S. If S has 


only a finite number of members, %,...,2n, we shall often describe S 
by displaying its members inside braces: 
S = {zn . . + Xn}. 


Thus, the set S of positive integers from 1 through 5 would be 
= {1, 2, 3, 4, 5}. 


If S and T are sets, we say that S is a subset of T, or that S is con- 
tained in T, if each member of S is a member of T. Each set S is a subset 
of itself. If S is a subset of T but S and T are not identical, we call S a 
proper subset of T. In other words, S is a proper subset of T provided 
that S is contained in T but T is not contained in S. 

If S and T are sets, the union of S and T is the set S U T, consisting 
of all objects x which are members of either S or T. The intersection 
of S and T is the set S Q T, consisting of all x which are members of 
both S and T. For any two sets, S and T, the intersection S Q T is a 
subset of the union S U T. This should help to clarify the use of the word 
‘or’ which will prevail in this book. When we say that x is either in S or 
in T, we do not preclude the possibility that x is in both S and T. 

In order that the intersection of S and T should always be a set, it 
is necessary that one introduce the empty set, i.e., the set with no mem- 
bers. Then S A T is the empty set if and only if S and T have no members 
in common. 

We shall frequently need to discuss the union or intersection of several 


sets. If Si,...,S, are sets, their union is the set U S; consisting of all 
j=1 
x which are members of at least one of the sets Sı, . . . , Sn. Their inter- 
section is the set N S;, consisting of all x which are members of each of 
j=1 
the sets S;,...,S,. On a few occasions, we shall discuss the union or 
intersection of an infinite collection of sets. It should be clear how such 
unions and intersections are defined. The following example should clarify 
these definitions and a notation for them. 


Examp.LeE 1. Let R denote the set of all real numbers (the real line). 
If ¢ is in R, we associate with ¢ a subset S; of X, defined as follows: S: 
consists of all real numbers x which are not less than ¢. 
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(a) Sa U Sa = Si, where t is the smaller of t, and tz. 

(b) Sa N Se = St, where ¢ is the larger of t and tz. 

(c) Let J be the unit interval, that is, the set of all ¢t in R satisfying 
0<t< 1. Then 


U S: = So 
tin T 
AN Si = Sy. 
tin I 


A.2. Functions 


A function consists of the following: 


(1) a set X, called the domain of the function; 


(2) a set Y, called the co-domain of the function; 
(3) a rule (or correspondence) f, which associates with each element 
x of X a single element f(x) of Y. 


If (X, Y, f) is a function, we shall also say f is a function from X 
into Y. This is a bit sloppy, since it is not f which is the function; f is 
the rule of the function. However, this use of the same symbol for the 
function and its rule provides one with a much more tractable way of 
speaking about functions. Thus we shall say that f is a function from X 
into Y, that X is the domain of f, and that Y is the co-domain of f—all 
this meaning that (X, Y, f) is a function as defined above. There are 
several other words which are commonly used in place of the word ‘func- 
tion.’ Some of these are ‘transformation,’ ‘operator,’ and ‘mapping.’ 
These are used in contexts where they seem more suggestive in conveying 
the role played by a particular function. 

If f is a function from X into Y, the range (or image) of f is the set 
of all f(x), x in X. In other words, the range of f consists of all elements 
y in Y such that y = f(x) for some z in X. If the range of f is all of Y, 
we say that f is a function from X onto Y, or simply that f is onto. The 
range of f is often denoted f(X). 


EXAMPLE 2. (a) Let X be the set of real numbers, and let Y = X. 
Let f be the function from X into Y defined by f(z) = x2. The range of 
f is the set of all non-negative real numbers. Thus f is not onto. 

(b) Let X be the Euclidean plane, and Y = X. Let f be defined as 
follows: If P is a point in the plane, then f(P) is the point obtained by 
rotating P through 90° (about the origin, in the counterclockwise direc- 
tion). The range of f is all of Y, i.e., the entire plane, and so f is onto. 

(c) Again let X be the Euclidean plane. Coordinatize X as in analytic 
geometry, using two perpendicular lines to identify the points of X with 
ordered pairs of real numbers (21, 22). Let Y be the z-axis, that is, all 
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points (x, 22) with z2 = 0. If P is a point of X, let f(P) be the point 
obtained by projecting P onto the z-axis, parallel to the x-axis. In other 
words, f((a1, x2)) = (a, 0). The range of f is all of Y, and so f is onto. 

(d) Let X be the set of real numbers, and let Y be the set of positive 
real numbers. Define a function f from X into Y by f(x) = e*. Then f is 
a function from X onto Y. 

(e) Let X be the set of positive real numbers and Y the set of all real 
numbers. Let f be the natural logarithm function, that is, the function 
defined by f(z) = logx = ln z. Again f is onto, i.e., every real number 
is the natural logarithm of some positive number. 


Suppose that X, Y, and Z are sets, that f is a function from X into 
Y, and that g is a function from Y into Z. There is associated with f and g 
a function go f from X into Z, known as the composition of g and f. 
It is defined by 
(gof)(z) = g(f(z)). 
For one simple example, let X = Y = Z, the set of real numbers; let 
Í, g, h be the functions from X into X defined by 


f@) =r, g(x) =e, = A(x) = e 


and then h = go f. The composition go f is often denoted simply gf; 
however, as the above simple example shows, there are times when this 
may lead to confusion. 

One question of interest is the following. Suppose f is a function from 
X into Y. When is there a function g from Y into X such that g(f(x)) = x 
for each x in X? If we denote by J the identity function on X, that is, 
the function from X into X defined by (x) = xz, we are asking the fol- 
lowing: When is there a function g from Y into X such that go f = J? 
Roughly speaking, we want a function g which ‘sends each element of Y 
back where it came from.’ In order for such a g to exist, f clearly must be 
1:1, that is, f must have the property that if zı Æ xə then f(a1) ¥ f(a). 
If f is 1:1, such a g does exist. It is defined as follows: Let y be an element 
of Y. If y is in the range of f, then there is an element x in X such that 
y = f(x); and since f is 1:1, there is exactly one such x. Define g(y) = zx. 
If y is not in the range of f, define g(y) to be any element of X. Clearly we 
then have gof = I. 

Let f be a function from X into Y. We say that f is invertible if 
there is a function g from Y into X such that 


(1) go f is the identity function on X, 
(2) fog is the identity function on Y. 


We have just seen that if there is a g satisfying (1), then f is 1:1. Similarly, 
one can see that if there is a g satisfying (2), the range of f is all of Y, i.e., 
f is onto. Thus, if f is invertible, f is 1:1 and onto. Conversely, if f is 1:1 
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and onto, there is a function g from Y into X which satisfies (1) and (2). 
Furthermore, this g is unique. It is the function from Y into X defined by 
this rule: if y is in Y, then g(y) is the one and only element x in X for 
which f(x) = y. 

If f is invertible (1:1 and onto), the inverse of f is the unique function 
f~ from Y into X satisfying 


(1’) f(f(@)) = 2, for each x in X, 
(2’) f(f“(y)) = y, for each y in Y. 


EXAMPLE 3. Let us look at the functions in Example 2. 


(a) If X = Y, the set of real numbers, and f(x) = 2’, then f is not 
invertible. For f is neither 1:1 nor onto. 

(b) If X = Y, the Euclidean plane, and f is ‘rotation through 90°,’ 
then f is both 1:1 and onto. The inverse function f~ is ‘rotation through 
—90°,’ or ‘rotation through 270°,’ 

(c) If X is the plane, Y the z-axis, and f((21, x2)) = (xı, 0), then f is 
not invertible. For, although f is onto, f is not 1:1. 

(d) If X isthe set of real numbers, Y the set of positive real numbers, 
andf(ax) = e", then f is invertible. The function f~! is the natural logarithm 
function of part (e): loge? = x, #4 = y 

(e) The inverse of this natural logarithm function is the exponential 
function of part (d). 


Let f be a function from X into Y, and let fy be a function from Xo 
into Yo. We call fo a restriction of f (or a restriction of f to Xo) if 


(1) Xo is a subset of X, 
(2) fo(x) = f(x) for each z in Xo. 


Of course, when fo is a restriction of f, it follows that Yo is a subset of Y. 
The name ‘restriction’ comes from the fact that f and fy have the same 
rule, and differ chiefly because we have restricted the domain of definition 
of the rule to the subset Xo of X. 

If we are given the function f and any subset Xo of X, there is an 
obvious way to construct a restriction of f to Xo. We define a function 
fo from Xo into Y by fo(x) = f(x) for each x in Xo. One might wonder why 
we do not call this the restriction of f to Xo. The reason is that in dis- 
cussing restrictions of f we want the freedom to change the co-domain Y, 
as well as the domain X. 


EXAMPLE 4. (a) Let X be the set of real numbers and f the function 
from X into X defined by f(x) = x®. Then f is not an invertible function, 
but it is if we restrict its domain to the non-negative real numbers. Let 
Xo be the set of non-negative real numbers, and let fo be the function 
from Xo into Xo defined by fo(z) = x. Then fo is a restriction of f to Xo. 
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Now f is neither 1:1 nor onto, whereas fo is both 1:1 and onto. The latter 
statement simply says that each non-negative number is the square of 
exactly one non-negative number. The inverse function fo ’ is the function 
from Xo into Xo defined by fo (x) = Vz. 

(b) Let X be the set of real numbers, and let f be the function from 
X into X defined by f(x) = 2? + x? + 1. The range of f is all of X, and 
so f is onto. The function f is certainly not 1:1, e.g., f(—1) = f(0). But 
f is 1:1 on Xo, the set of non-negative real numbers, because the derivative 
of f is positive for z > 0. As x ranges over all non-negative numbers, f(x) 
ranges over all real numbers y such that y > 1. If we let Yo be the set of 
ally > 1, and let fo be the function from Xo into Yo defined by fo(x) = f(z), 
then fo is a 1:1 function from Xeo onto Yo. Accordingly, fo has an inverse 
function fo ‘from Yo onto Xo. Any formula for fo ‘(y) is rather complicated. 

(c) Again let X be the set of real numbers, and let f be the sine func- 
tion, that is, the function from X into X defined by f(x) = sin z. The 
range of f is the set of all y such that —1 < y < 1; hence, f is not onto. 
Since f(z + 2r) = f(x), we see that f is not 1:1. If we let Xo be the interval 
—a/2 < x < 1/2, then fis 1:1 on Xo. Let Yo be the interval —1 < y < 1, 
and let fo be the function from Xo into Yo defined by f)(x) = sin x. Then 
fois a restriction of f to the interval Xo, and fo is both 1:1 and onto. This 
is just another way of saying that, on the interval from —2/2 to 1/2, 
the sine function takes each value between —1 and 1 exactly once. The 
function fo! is the inverse sine function: 

Jo y) = sint y = arc sin y. 

(d) This is a general example of a restriction of a function. It is 
much more typical of the type of restriction we shall use in this book 
than are the examples in (b) and (c) above. The example in (a) is a special 
case of this one. Let X be a set and f a function from X into itself. Let Xo 
be a subset of X. We say that Xo is invariant under f if for each z in Xo 
the element f(x) is in Xo. If Xo is invariant under f, then f induces a func- 
tion fo from Xo into itself, by restricting the domain of its definition to Xo. 
The importance of invariance is that by restricting f to Xo we can obtain 
a function from Xp into itself, rather than simply a function from Xo 
into X. 
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A.3. Equivalence Relations 


An equivalence relation is a specific type of relation between pairs 
of elements in a set. To define an equivalence relation, we must first decide 
what a ‘relation’ is. 

Certainly a formal definition of ‘relation’ ought to encompass such 
familiar relations as ‘x = y, ‘x < y, ‘x is the mother of y, and ‘z is 
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older than y.’ If X is a set, what does it take to determine a relation be- 
tween pairs of elements of X? What it takes, evidently, is a rule for deter- 
mining whether, for any two given elements x and y in X, x stands in 
the given relationship to y or not. Such a rule R, we shall call a (binary) 
relation on X. If we wish to be slightly more precise, we may proceed 
as follows. Let X X X denote the set of all ordered pairs (2, y) of elements 
of X. A binary relation on X is a function R from X X X into the set 
{0, 1}. In other words, R assigns to each ordered pair (x, y) either a 1 or 
a 0. The idea is that if R(x, y) = 1, then x stands in the given relationship 
to y, and if R(x, y) = 0, it does not. 

If R is a binary relation on the set X, it is convenient to write xRy 
when F(z, y) = 1. A binary relation R is called 


(1) reflexive, if xRx for each x in X; 
(2) symmetric, if yx whenever tRy; 
(3) transitive, if xz whenever zRy and yz. 


An equivalence relation on X is a reflexive, symmetric, and transitive 
binary relation on X. 


EXAMPLE 5. (a) On any set, equality is an equivalence relation. In 
other words, if rRy means x = y, then R is an equivalence relation. For, 
x = x, if x = y then y = z, if x = y and y = z then z =z. The relation 
‘x Æ y’ is symmetric, but neither reflexive nor transitive. 

(b) Let X be the set of real numbers, and suppose xRy means x < y. 
Then R is not an equivalence relation. It is transitive, but it is neither 
reflexive nor symmetric. The relation ‘x < y’ is reflexive and transitive, 
but not symmetric. 

(c) Let E be the Euclidean plane, and let X be the set of all triangles 
in the plane Æ. Then congruence is an equivalence relation on X, that is, 
‘Tı S T? (T, is congruent to T») is an equivalence relation on the set of 
all triangles in a plane. 

(d) Let X be the set of all integers: 


..., —2, ~1,0,1,2,.... 
Let n be a fixed positive integer. Define a relation R, on X by: zkay 


if and only if (x — y) is divisible by n. The relation Ra is called con- 
gruence modulo n. Instead of tRy, one usually writes 


x = y, mod n (x is congruent to y modulo n) 


when (x — y) is divisible by n. For each positive integer n, congruence 
modulo n is an equivalence relation on the set of integers. 

(e) Let X and Y be sets and f a function from X into Y. We define 
a relation R on X by: zR: if and only if f(a) = f(x). It is easy to verify 
that R is an equivalence relation on the set X. As we shall see, this one 
example actually encompasscs all equivalence relations. 


Sec, A.3 Equivalence Relations 


Suppose R is an equivalence relation on the set X. If x is an element 
of X, we let E(x; R) denote the set of all elements y in X such that tRy. 
This set E(x; R) is called the equivalence class of x (for the equivalence 
relation R). Since R is an equivalence relation, the equivalence classes 
have the following properties: 


(1) Each E(x; R) is non-empty; for, since zRz, the element x belongs 
to E(x; R). 

(2) Let x and y be elements of X. Since R is symmetric, y belongs to 
E(x; R) if and only if x belongs to L(y; R). 

(3) If z and y are elements of X, the equivalence classes E(x; R) and 
E(y; R) are either identical or they have no members in common. First, 
suppose xRy. Let z be any element of E(x; R) i.e., an element of X such 
that xRz. Since R is symmetric, we also have zRz. By assumption xRy, 
and because R is transitive, we obtain zRy or yRz. This shows that any 
member of E(x; R) is a member of E(y; E). By the symmetry of R, we 
likewise see that any member of E(y; R) is a member of E(x; R); hence 
E(x; R) = E(y; R). Now we argue that if the relation xRy does not hold, 
then E(x; R) Q E(y; R) is empty. For, if z is in both these equivalence 
classes, we have xRz and yRz, thus zRz and zRy, thus xRy. 


If we let F be the family of equivalence classes for the equivalence 
relation R, we see that (1) each set in the family F is non-empty, (2) each 
element x of X belongs to one and only one of the sets in the family F, 
(3) «Ry if and only if ze and y belong to the same set in the family S. 
Briefly, the equivalence relation R subdivides X into the union of a family 
of non-overlapping (non-empty) subsets. The argument also goes in the 
other direction. Suppose F is any family of subsets of X which satisfies 
conditions (1) and (2) immediately above. If we define a relation R by (3), 
then R is an equivalence relation on X and & is the family of equivalence 
classes for Ft. 


ExamPrLE 6. Let us see what the equivalence classes are for the 
equivalence relations in Example 5. 


(a) If R is equality on the set X, then the equivalence class of the 
element x is simply the set {x}, whose only member is zx. 

(b) If X is the set of all triangles in a plane, and R is the congruence 
relation, about all one can say at the outset is that the equivalence class 
of the triangle T consists of all triangles which are congruent to T. One of 
the tasks of plane geometry is to give other descriptions of these equivalence 
classes. 

(c) If X is the set of integers and R, is the relation ‘congruence 
modulo n,’ then there are precisely n equivalence classes. Each integer 
x is uniquely expressible in the form x = gn + r, where q and r are integers 
and 0 < r < n — 1. This shows that each x is congruent modulo n to 
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exactly one of the n integers 0, 1, 2,...,n — 1. The equivalence classes 
are 
Ey = {..., —2n, —n, 0, n, 2n,.. .} 


Ey = {...,1—2n,1—n,14+ 7,14 2n,...} 


E,4=..,n—-1-—2n,n-1l—-nn-1,n—-14 2, 
n—-1+2n,...}. 

(d) Suppose X and Y are sets, f is a function from X into Y, and R 
is the equivalence relation defined by: 2fxe if and only if f(xı) = f(a). 
The equivalence classes for R are just the largest subsets of X on which 
Jis ‘constant.’ Another description of the equivalence classes is this. They 
are in 1:1 correspondence with the members of the range of f. If y is in 
the range of f, the set of all x in X such that f(x) = y is an equivalence 
class for R; and this defines a 1:1 correspondence between the members 
of the range of f and the equivalence classes of R. 

Let us make one more comment about equivalence relations. Given 
an equivalence relation R on X, let F be the family of equivalence classes 
for R. The association of the equivalence class E(x; R) with the element 
x, defines a function f from X into & (indeed, onto $): 

f(x) = E(x; R). 

This shows that R is the equivalence relation associated with a function 
whose domain is X, as in Example 5(e). What this tells us is that every 
equivalence relation on the set X is determined as follows. We have a rule 
(function) f which associates with each element x of X an object f(z), 
and xRy if and only if f(x) = f(y). Now one should think of f(x) as some 
property of x, so that what the equivalence relation does (roughly) is to 
lump together all those elements of X which have this property in com- 
mon. If the object f(x) is the equivalence class of x, then all one has said 
is that the common property of the members of an equivalence class is 
that they belong to the same equivalence class. Obviously this doesn’t 
say much. Generally, there are many different functions f which deter- 
mine the given equivalence relation as above, and one objective in the 
study of equivalence relations is to find such an f which gives a meaningful 
and elementary description of the equivalence relation. In Section A.5 
we shall see how this is accomplished for a few special equivalence rela- 
tions which arise in linear algebra. 


A.4. Quotient Spaces 


Let V be a vector space over the field F, and let W be a subspace of 
V. There are, in general, many subspaces W’ which are complementary 
to W, i.e., subspaces with the property that V = W@ W’. If we have 
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an inner product on V, and W is finite-dimensional, there is a particular 
subspace which one would probably call the ‘natural’ complementary 
subspace for W. This is the orthogonal complement of W. But, if V has 
no structure in addition to its vector space structure, there is no way of 
selecting a subspace W’ which one could call the natural complementary 
subspace for W. However, one can construct from V and W a vector space 
V/W, known as the ‘quotient’ of V and W, which will play the role of the 
natural complement to W. This quotient space is not a subspace of V, 
and so it cannot actually be a subspace complementary to W; but, it is 
a vector space defined only in terms of V and W, and has the property 
that it is isomorphic to any subspace W’ which is complementary to W. 

Let W be a subspace of the vector space V. If a and # are vectors 
in V, we say that a is congruent to 8 modulo W, if the vector (a — 8) 
is in the subspace W. If a is congruent to 8 modulo W, we write 


a = B, mod W. 
Now congruence modulo W is an equivalence relation on V. 


(1) a =a, mod W, because a — a = 0 is in W. 

(2) If a = B, mod W, then 8B = a, mod W. For, since W is a subspace 
of V, the vector (a — 8) is in W if and only if (8 — a) is in W. 

(3) If a = B, mod W, and 8 = y, mod W, then a = y, mod W. For, 
if (a — 8) and (8 — y) arein W, then a — y = (a — 8) +8 — y) isin W. 

The equivalence classes for this equivalence relation are known as 
the cosets of W. What is the equivalence class (coset) of a vector a? It 
consists of all vectors 8 in V such that (8 — a) is in W, that is, all vectors 
B of the form 8 = a + y, with y in W. For this reason, the coset of the 
vector a is denoted by 

a+ W. 


It is appropriate to think of the coset of a relative to W as the set of 
vectors obtained by translating the subspace W by the vector a. To 
picture these cosets, the reader might think of the following special case. 
Let V be the space R?, and let W be a one-dimensional subspace of V. 
If we picture V as the Euclidean plane, W is a straight line through the 
origin. If a = (2, zə) is a vector in V, the coset a + W is the straight line 
which passes through the point (21, x2) and is parallel to W. 

The collection of all cosets of W will be denoted by V/W. We now 
define a vector addition and scalar multiplication on V/W as follows: 


(a+ W) + (8+ W) = (a+ 8)+W 
cla + W) = (ca) + W. 


In other words, the sum of the coset of a and the coset of 8 is the coset of 
(a + 8), and the product of the scalar c and the coset of a is the coset of 
the vector ca. Now many different vectors in V will have the same coset 
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relative to W, and so we must verify that the sum and product above 
depend only upon the cosets involved. What this means is that we must 
show the following: 


(a) If a = a’, mod W, and 8 = 8’, mod W, then 
a t+t8>a + 6’, mod W. 
(2) If a = a’, mod W, then ca = ca’, mod W. 


These facts are easy to verify. (1) If a — a’ is in W and 8 — $’ is in 
W, then since (a + 8) — (a’ — 6’) = (a — a’) + (8 — B’), we see that 
a+ 6 is congruent to a’ — 8’ modulo W. (2) If a — a’ is in W and c is 
any scalar, then ca — ca’ = c(a — a’) is in W, 

It is now easy to verify that V/W, with the vector addition and scalar 
multiplication defined above, is a vector space over the field F. One must 
directly check each of the axioms for a vector space. Each of the properties 
of vector addition and scalar multiplication follows from the corresponding 
property of the operations in V. One comment should be made. The zero 
vector in V/W will be the coset of the zero vector in V. In other words, 
W is the zero vector in V/W. 

The vector space V/W is called the quotient (or difference) of V 
and W. There is a natural linear transformation Q from V onto V/W. 
It is defined by Q(a) = a+ W. One should see that we have defined 
the operations in V/W just so that this transformation Q would be linear. 
Note that the null space of Q is exactly the subspace W. We call Q the 
quotient transformation (or quotient mapping) of V onto V/W. 

The relation between the quotient space V/W and subspaces of V 
which are complementary to W can now be stated as follows. 


Theorem. Let W be a subspace of the vector space V, and let Q be the 
quotient mapping of V onto V/W. Suppose W’ is a subspace of V. Then 
V= WO W' if and only if the restriction of Q to W’ is an isomorphism 
of W onto V/W. 


Proof. Suppose V = W ® W’. This means that each vector «æ in 
V is uniquely expressible in the form a = y + y’, with y in W and y’ in 
W’. Then Qa = Qy + QY = Qy, that isa + W = y’ + W. This shows 
that Q maps W’ onto V/W, i.e., that Q(W’) = V/W. Also Qis 1:1 on W’; 
for suppose yi and yz are vectors in W’ and that Qyi = Qy} Then 
Q(vi — y2) = 0 so that yi — y is in W. This vector is also in W’, which 
is disjoint from W; hence yi ~ y$ = 0. The restriction of Q to W’ is 
therefore a one-one linear transformation of W’ onto V/W. 

Suppose W’ is a subspace of V such that Q is one-one on W’ and 
Q(W’) = V/W. Let a be a vector in V. Then there is a vector y’ in W’ 
such that Qy’ = Qa, i.e., y + W = a + W. This means that a = y + 7’ 
for some vector y in W. Therefore V = W + W’. To see that W and W’ 
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are disjoint, suppose y is in both W and W’. Since y is in W, we have 
Qy = 0. But Q is 1:1 on W’, and so it must be that y = 0. Thus we have 
V=WOw'. J 


What this theorem really says is that W’ is complementary to W if 
and only if W’ is a subspace which contains exactly one element from each 
coset of W. It shows that when V = W@W’, the quotient mapping Q 
‘identifies’ W’ with V/W. Briefly (W @ W’)/W is isomorphic to W’ in 
a ‘natural’ way. 

One rather obvious fact should be noted. If W is a subspace of the 
finite-dimensional vector space V, then 


dim W + dim (V/W) = dim V. 


One can see this from the above theorem. Perhaps it is easier to observe 
that what this dimension formula says is 


nullity (Q) + rank (Q) = dim V. 


It is not our object here to give a detailed treatment of quotient 
spaces. But there is one fundamental result which we should prove. 


Theorem. Let V and Z be vector spaces over the field F. Suppose T is 
a linear transformation of V onto Z. If W is the null space of T, then Z is 
isomorphic to V/W. 


Proof. We define a transformation U from V/W into Z by 
Ula + W) = Ta. We must verify that U is well defined, i.e., that if 
a +W =8B+W then Ta = T6. This follows from the fact that W is 
the null space of T; for,a + W = 6B + W means a — B is in W, and this 
happens if and only if T(a — 8) = 0. This shows not only that U is well 
defined, but also that U is one-one. 
It is now easy to verify that U is linear and sends V/W onto Z, 
because T is a linear transformation of V onto Z. § 


A.5. Equivalence Relations 
in Linear Algebra 


We shall consider some of the equivalence relations which arise in 
the text of this book. This is just a sampling of such relations. 


(1) Let m and n be positive integers and F a field. Let X be the set 
of all m X n matrices over F. Then row-equivalence is an equivalence 
relation on the set X. The statement ‘A is row-equivalent to B’ means 
that A can be obtained from B by a finite succession of elementary row 
operations. If we write A ~ B for A is row-equivalent to B, then it is not 
difficult to check the properties (i) A ~ A; (ii) if A ~B, then B~ A; 
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(iii) if A ~B and B~C, then A ~C. What do we know about this 
equivalence relation? Actually, we know a great deal. For example, we 
know that A ~ B if and only if A = PB for some invertible m X m 
matrix P; or, A ~ B if and only if the homogeneous systems of linear 
equations AX = 0 and BX = 0 have the same solutions. We also have 
very explicit information about the equivalence classes for this relation. 
Each m X n matrix A is row-equivalent to one and only one row-reduced 
echelon matrix. What this says is that each equivalence class for this rela- 
tion contains precisely one row-reduced echelon matrix R; the equivalence 
class determined by R consists of all matrices A = PR, where P is an 
invertible m X m matrix. One can also think of this description of the 
equivalence classes in the following way. Given an m X n matrix A, we 
have a rule (function) f which associates with A the row-reduced echelon 
matrix f(A) which is row-equivalent to A. Row-equivalence is completely 
determined by f. For, A ~ B if and only if f(A) = f(B), i.e., if and only 
if A and B have the same row-reduced echelon form. 

(2) Let n be a positive integer and F a field. Let X be the set of all 
n X n matrices over F. Then similarity is an equivalence relation on X; 
each n X n matrix A is similar to itself; if A is similar to B, then B is 
similar to A; if A is similar to B and B is similar to C, then A is similar to 
C. We know quite a bit about this equivalence relation too. For example, 
A is similar to B if and only if A and B represent the same linear operator 
on F” in (possibly) different ordered bases. But, we know something much 
deeper than this. Each n X n matrix A over F is similar (over F) to one 
and only one matrix which is in rational form (Chapter 7). In other words, 
each equivalence class for the relation of similarity contains precisely one 
matrix which is in rational form. A matrix in rational form is determined 


by a k-tuple (pi, ..., p) of monic polynomials having the property that 
Pj} divides p; 7 = 1,...,k— 1. Thus, we have a function f which 
associates with each n Xn matrix A a k-tuple f(A) = (m,..., Pa) 


satisfying the divisibility condition p;}ı divides p;. And, A and B are 
similar if and only if f(A) = f(B). 

(3) Here is a special case of Example 2 above. Let X be the set of 
3 X 3 matrices over a field F. We consider the relation of similarity on X. 
If A and B are 3 X 3 matrices over F, then A and B are similar if and 
only if they have the same characteristic polynomial and the same minimal 
polynomial. Attached to each 3 X 3 matrix A, we have a pair (f, p) of 
monic polynomials satisfying 

(a) deg f = 3, 

(b) p divides f, 
f being the characteristic polynomial for A, and p the minimal polynomial 


for A. Given monic polynomials f and p over F which satisfy (a) and (b), 
it is easy to exhibit a 3 X 3 matrix over F, having f and p as its charac- 
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teristic and minimal polynomials, respectively. What all this tells us is 
the following. If we consider the relation of similarity on the set of 3 X 3 
matrices over F, the equivalence classes are in one-one correspondence 
with ordered pairs (f, p) of monic polynomials over F which satisfy (a) 
and (b). 
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A.6. The Axiom of Choice 


Loosely speaking, the Axiom of Choice is a rule (or principle) of 
thinking which says that, given a family of non-empty sets, we can choose 
one element out of each set. To be more precise, suppose that we have 
an index set A and for each a in A we have an associated set Se, which is 
non-empty. To ‘choose’ one member of each Sa means to give a rule f 
which associates with each a some element f(a) in the set Se The axiom 
of choice says that this is possible, i.e., given the family of sets {Sae}, there 
exists a function f from A into 


U Sa 


such thatf(a) is in Se for each a. This principle is accepted by most mathe- 
maticians, although many situations arise in which it is far from clear 
how any explicit function f can be found. 

The Axiom of Choice has some startling consequences. Most of them 
have little or no bearing on the subject matter of this book; however, one 
consequence is worth mentioning: Every vector space has a basis. For 
example, the field of real numbers has a basis, as a vector space over the 
field of rational numbers. In other words, there is a subset S of R which 
is linearly independent over the field of rationals and has the property 
that each real number is a rational linear combination of some finite 
number of elements of S. We shall not stop to derive this vector space 
result from the Axiom of Choice. For a proof, we refer the reader to the 
book by Kelley in the bibliography. 
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Interpolation, 124 

Intersection, 388 
of subspaces, 36 

Invariant: 
direct sum, 214 
factors of a matrix, 239, 261 
subset, 392 
subspace, 199, 206, 314 

Inverse: 
of function, 391 
left, 22 
of matrix, 22, 160 
right, 22 
two-sided, 22 

Invertible: 
function, 390 
linear transformation, 79 
matrix, 22, 160 

Irreducible polynomial, 135 

Isomorphism: 
of inner product spaces, 299 
of vector spaces, 84 


J 


Jordan form of matrix, 247 


K 


Kronecker delta, 9 


L 


Lagrange interpolation formula, 124 
Laplace expansions, 179 
Left inverse, 22 
Linear algebra, 117 
Linear combination: 
of equations, 4 
of vectors, 31 
Linear equations (see System of linear 
equations) 
Linear functional, 97 
Linearly dependent (independent), 40, 47 
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Linear transformation (operator), 67, 76 Matrix (cont.): 
adjoint of, 295 of linear transformation, 87, 88 
cyclic decomposition of, 233 minimal polynomial of, 191 
determinant of, 172 nilpotent, 244 
diagonalizable, 185 normal, 315 
diagonalizable part of, 222 orthogonal, 162(Ex. 4), 380 
invertible, 79 positive, 329 
matrix in orthonormal basis, 293 principal minors of, 326 
matrix of, 87, 88 product, 17, 90 
minimal polynomial of, 191 rank of, 114 
nilpotent, 222 rational form of, 238 
non-negative, 329, 341 row rank of, 56, 72, 114 
non-singular, 79 row-reduced, 9 
normal, 312 row-reduced echelon, 11, 56 
nullity of, 71 self-adjoint (Hermitian), 35, 314 
orthogonal, 303 similarity of, 94 
polar decomposition of, 343 skew-symmetric, 162(Ex. 3), 210 
positive, 329 symmetric, 35, 210 
product of, 76 trace of, 98 
quotient, 397 transpose of, 114 
range of, 71 triangular, 155(Ex. 7) 
rank of, 71 unitary, 163(Ex. 5), 303 
self-adjoint, 298, 314 upper-triangular, 27 
semi-simple, 263 Vandermonde, 125 
trace of, 1@6(Ex. 15) zero, 12 
transpose of, 112 Minimal polynomial, 191 
triangulable, 202 Module, 164 
unitary, 302 basis for, 164 

Lorentz: dual, 165 
group, 382 finitely generated, 165 
transformation, 311(Ex. 15), 382 free, 164 


rank of, 165 
Monic polynomial, 120 
Multilinear function (form), 166 
degree of, 166 
Matrix, 6 Multiplicity, 130 
augmented, 14 
of bilinear form, 362 
classical adjoint of, 148, 159 


M 


coefficient, 6 N 
cofactors, 158 

companion, 230 n-linear function, 142 
conjugate transpose, 272 alternating, 144, 169 
coordinate, 51 n-tuple, 29 
elementary, 20, 253 Nilpotent: 
elementary, Jordan, 245 matrix, 244 

of form, 322 operator, 222 
identity, 9 Non-degenerate: 

of inner product, 274 bilinear form, 365 
invariant factors of, 239, 261 form, 324(Ex. 6) 
inverse of, 22, 160 Non-negative: 
invertible, 22, 160 form, 325 


Jordan form of, 247. operator, 329, 341 


Non-singular: 
form (see Non-degenerate) 
linear transformation, 79 
Norm, 273 
Normal: 
form, 257, 261 
matrix, 315 
operator, 312 
Nullity of linear transformation, 71 
Null space, 71 
Numbers: 
complex, 2 
rational, 3 
real, 2 


Onto, 389 
Operator, linear, 76 
Ordered basis, 50 
Orthogonal: 
complement, 285 
equivalence of matrices, 308 
group, 380 
linear transformation, 304 
matrix, 162(Ex. 4), 380 
projection, 285 
set, 278 
vectors, 278, 368 
Orthogonalization, 280 
Orthonormal: 
basis, 281 
set, 278 


P 


Parallelogram law, 276(Ex. 9) 
Permutation, 151 
even, odd, 152 
product of, 153 
sign of, 152 
Polar decomposition, 343 
Polarization identities, 274, 368 
Polynomial, 119 
characteristic, 183 
coefficients of, 120 
degree of, 119 
derivative of, 129, 266 
function, 30 
irreducible (prime), 135 
minimal, 191 


Index 


Polynomial (cont.): 
monic, 120 
primary decomposition of, 137 
prime (irreducible), 135 
prime factorization of, 136 
reducible, 135 
root of, 129 


scalar, 120 
zero of, 129 
Positive: 


form, 325, 328 
integers, 2 
matrix, 329 
operator, 329 
Positive definite, 368 
Power series, 119 
Primary components, 351 
Primary decomposition: 
of polynomial, 137 
theorem, 220 
Prime: 
factorization of polynomial, 136 
polynomial, 135 
Principal: 
access theorem, 323 
ideal, 131 
minors, 326 
Product: 
exterior (wedge), 175, 177 
of linear transformations, 76 
of matrices, 14, 90 
of permutations, 153 
tensor, 168 
Projection, 211 
Proper subset, 388 
Pseudo-orthogonal group, 381 


Q 


Quadratic form, 273, 368 
Quotient: 
space, 397 
transformation, 397 


R 


Range, 71 

Rank: 
of bilinear form, 365 
column, 72, 114 
determinant, 163(Ex. 9) 
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Rank (cont.): Similar matrices, 94 
of linear transformation, 71 Simultaneous: 
of matrix, 114 diagonalization, 207 
of module, 165 triangulation, 207 
row, 56, 72, 114 Skew-symmetric: 
Rational form of matrix, 238 bilinear form, 375 
Reducible polynomial, 135 matrix, 162(Ex. 3), 210 
Relation, 393 Solution space, 36 
equivalence, 393 Spectral: 
Relatively prime, 133 resolution, 336, 344 
Resolution: theorem, 335 
of the identity, 337, 344 Spectrum, 336 
spectral, 336, 344 Square root, 341 
Restriction: Standard basis of F”, 4] 
of function, 391 Stuffer (das einstopfende Ideal), 201 
operator, 199 Subfield, 2 
Right inverse, 22 Submatrix, 163(Ex. 9) 
Rigid motion, 310(Ex. 14) Subset, 388 
Ring, 140 invariant, 392 
Grassman, 180 proper, 388 
Root: Subspace, 34 
of family of operators, 343 annihilator of, 101 
of polynomial, 129 complementary, 231 
Rotation, 54, 309(Ex. 4) cyclic, 227 
Row: independent subspaces, 209 
operations, 6, 252 invariant, 199, 206, 314 
rank, 56, 72, 114 orthogonal complement of, 285 
space, 39 quotient by, 397 
vectors, 38 spanned by, 36 
Row-equivalence, 7, 58, 253 sum of subspaces, 37 
summary of, 55 T-admissible, 232 
Row-reduced matrix, 9 zero, 35 
row-reduced echelon matrix, 11, 56 Sum: 
direct, 210 
of subspaces, :37 
S Symmetric: 
bilinear form, 367 
Scalar, 2 group, 153 
polynomial, 120 matrix, 35, 210 
Self-adjoint: System of linear equations, 3 
algebra, 345 homogeneous, 4 
matrix, 35, 314 
operator, 298, 314 T 
Semi-simple operator, 263 
Separating vector, 243(Ex. 14) T-admissible subspace, 232 
Sequence of vectors, 47 T-annihilator, 201, 202, 228 
Sesqui-linear form, 320 T-conductor, 201, 202, 232 
Set, 388 Taylor’s formula, 129, 266 
element of (member of), 388 Tensor, 166 
empty, 388 product, 168 
Shuffle, 171 Trace: 
Signature, 372 of linear transformation, 106(Ex. 15) 


Sign of permutation, 152 of matrix, 98 


Transformation: 
differentiation, 67 
linear, 67, 76 
zero, 67 
Transpose: 
conjugate, 272 
of linear transformation, 112 
of matrix, 114 
Triangulable linear transformation, 202, 
316 
Triangular matrix, 155(Ex. 7) 
Triangulation, 203, 207, 334 


U 


Union, 388 
Unitary: 
diagonalization, 317 
equivalence of linear transformations, 
356 
equivalence of matrices, 308 
matrix, 163(Ex. 5), 303 
operator, 302 
space, 277 
transformation, 356 
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Upper-triangular matrix, 27 


v 


Vandermonde matrix, 125 
Vector space, 28 
basis of, 41 
dimension of, 44 
finite dimensional, 41 
isomorphism of, 84 
of n-tuples, 29 
of polynomial functions, 30 
quotient of, 397 
of solutions to linear equations, 36 
subspace of, 34 


w 

Wedge (exterior) product, 175, 177 
Z 

Zero: 


matrix, 12 
of polynomial, 129 
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